Data utilities

Data utilities — Functions for coalescing, merging, date handling and normalizing

Stability Level

Stable, unless otherwise indicated

Synopsis

#include <libtracker-extract/tracker-extract.h>

gchar *             tracker_coalesce                    (gint n_values,
                                                         ...);
gchar *             tracker_merge                       (const gchar *delimiter,
                                                         gint n_values,
                                                         ...);
gchar *             tracker_text_normalize              (const gchar *text,
                                                         guint max_words,
                                                         guint *n_words);
gchar *             tracker_date_format_to_iso8601      (const gchar *date_string,
                                                         const gchar *format);
gchar *             tracker_date_guess                  (const gchar *date_string);

Description

This API is provided to facilitate common more general functions which extractors may find useful. These functions are also used by the in-house extractors quite frequently.

Details

tracker_coalesce ()

gchar *             tracker_coalesce                    (gint n_values,
                                                         ...);

This function iterates through a series of string pointers passed using Varargs and returns the first which is not NULL, not empty (i.e. "") and not comprised of one or more spaces (i.e. " ").

The returned value is stripped using g_strstrip(). All other values supplied are freed. It is MOST important NOT to pass constant string pointers to this function!

n_values :

the number of Varargs supplied

... :

the string pointers to coalesce

Returns :

the first string pointer from those provided which matches, otherwise NULL.

Since 0.8


tracker_merge ()

gchar *             tracker_merge                       (const gchar *delimiter,
                                                         gint n_values,
                                                         ...);

This function iterates through a series of string pointers passed using Varargs and returns a newly allocated string of the merged strings.

The delimiter can be NULL. If specified, it will be used in between each merged string in the result.

delimiter :

the delimiter to use when merging

n_values :

the number of Varargs supplied

... :

the string pointers to merge

Returns :

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_text_normalize ()

gchar *             tracker_text_normalize              (const gchar *text,
                                                         guint max_words,
                                                         guint *n_words);

This function iterates through text checking for UTF-8 validity using g_utf8_get_char_validated(). For each character found, the GUnicodeType is checked to make sure it is one fo the following values:

  • G_UNICODE_LOWERCASE_LETTER

  • G_UNICODE_MODIFIER_LETTER

  • G_UNICODE_OTHER_LETTER

  • G_UNICODE_TITLECASE_LETTER

  • G_UNICODE_UPPERCASE_LETTER

All other symbols, punctuation, marks, numbers and separators are stripped. A regular space (i.e. " ") is used to separate the words in the returned string.

The n_words can be NULL. If specified, it will be populated with the number of words that were normalized in the result.

text :

the text to normalize

max_words :

the maximum words of text to normalize

n_words :

the number of words actually normalized

Returns :

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_date_format_to_iso8601 ()

gchar *             tracker_date_format_to_iso8601      (const gchar *date_string,
                                                         const gchar *format);

This function uses strptime() to create a time tm structure using date_string and format.

date_string :

the date in a string pointer

format :

the format of the date_string

Returns :

a newly-allocated string with the time represented in ISO8601 date format which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_date_guess ()

gchar *             tracker_date_guess                  (const gchar *date_string);

This function uses a number of methods to try and guess the date held in date_string. The date_string must be at least 5 characters in length or longer for any guessing to be attempted. Some of the string formats guessed include:

  • "YYYY-MM-DD" (Simple format)

  • "20050315113224-08'00'" (PDF format)

  • "20050216111533Z" (PDF format)

  • "Mon Feb 9 10:10:00 2004" (Microsoft Office format)

  • "2005:04:29 14:56:54" (Exif format)

  • "YYYY-MM-DDThh:mm:ss.ff+zz:zz

date_string :

the date in a string pointer

Returns :

a newly-allocated string with the time represented in ISO8601 date format which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8