Main Page   Class Hierarchy   Alphabetical List   Data Structures   File List   Data Fields   Globals  

StringSearch Class Reference

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object. More...

#include <stsearch.h>

Inheritance diagram for StringSearch:

SearchIterator

Public Methods

 StringSearch (const UnicodeString &pattern, const UnicodeString &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set. More...

 StringSearch (const UnicodeString &pattern, const UnicodeString &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set. More...

 StringSearch (const UnicodeString &pattern, CharacterIterator &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set. More...

 StringSearch (const UnicodeString &pattern, CharacterIterator &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set. More...

 StringSearch (const StringSearch &that)
 Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text. More...

virtual ~StringSearch (void)
 Destructor. More...

StringSearch & operator= (const StringSearch &that)
 Assignment operator. More...

virtual UBool operator== (const SearchIterator &that) const
 Equality operator. More...

virtual void setOffset (UTextOffset position, UErrorCode &status)
 Sets the index to point to the given position, and clears any state that's affected. More...

virtual UTextOffset getOffset (void) const
 Return the current index in the text being searched. More...

virtual void setText (const UnicodeString &text, UErrorCode &status)
 Set the target text to be searched. More...

virtual void setText (CharacterIterator &text, UErrorCode &status)
 Set the target text to be searched. More...

RuleBasedCollatorgetCollator () const
 Gets the collator used for the language rules. More...

void setCollator (RuleBasedCollator *coll, UErrorCode &status)
 Sets the collator used for the language rules. More...

void setPattern (const UnicodeString &pattern, UErrorCode &status)
 Sets the pattern used for matching. More...

const UnicodeStringgetPattern () const
 Gets the search pattern. More...

virtual void reset ()
 Reset the iteration. More...

virtual SearchIteratorsafeClone (void) const
 Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one. More...


Protected Methods

virtual UTextOffset handleNext (UTextOffset position, UErrorCode &status)
 Search forward for matching text, starting at a given location. More...

virtual UTextOffset handlePrev (UTextOffset position, UErrorCode &status)
 Search backward for matching text, starting at a given location. More...


Detailed Description

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object.

StringSearch ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information.

The algorithm implemented is a modified form of the Boyer Moore's search. For more information see "Efficient Text Searching in Java", published in Java Report in February, 1999, for further information on the algorithm.

There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and end <start, end>.
A pattern string P matches a text string S at the offsets <start, end> if

 
 option 1. Some canonical equivalent of P matches some canonical equivalent 
           of S'
 option 2. P matches S' and if P starts or ends with a combining mark, 
           there exists no non-ignorable combining mark before or after S? 
           in S respectively. 
 
Option 2. will be the default·

This search has APIs similar to that of other text iteration mechanisms such as the break iterators in BreakIterator. Using these APIs, it is easy to scan through text looking for all occurances of a given pattern. This search iterator allows changing of direction by calling a reset followed by a next or previous. Though a direction change can occur without calling reset first, this operation comes with some speed penalty. Match results in the forward direction will match the result matches in the backwards direction in the reverse order

SearchIterator provides APIs to specify the starting position within the text string to be searched, e.g. setOffset, preceding and following. Since the starting position will be set as it is specified, please take note that there are some danger points which the search may render incorrect results: