Cocoa: Explode or break an NSString into individual words

Posted October 20, 2008 by Quinn McHenry in Computer programming

Breaking apart a string of text into component words is a requirement for performing searches in text and other text processing. This task is easy in Cocoa/Objective-C, although it requires digging through a few class references in the documentation. If you need a more complicated expansion of a string, at least this code will give you a starting point.

To break the NSString bigString into an NSArray containing the individual words separated by whitespace, use:

NSString *bigString = @"not really that big";
NSArray *words = [bigString componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];

The heart of this operation is the componentsSeparatedByCharactersInSet method of NSString. It breaks bigString into an array of NSStrings. The word boundaries are set by the NSCharacterSet object generated by the class method whitespaceCharacterSet which provides space and tab characters. The various unicode newline characters can be added to those whitespace characters by calling the whitespaceAndNewlineCharacterSet method in the example above.

Of course, words can be separated by more than whitespace and newlines. Punctuation characters can be referenced using the punctuationCharacterSet method to NSCharacterSet. To perform a proper detonation of grammatical text into constituent words separated by whitespace, newlines, and punctuation, you must create a character set that is a union of those three sets:

NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSArray *words = [bigString componentsSeparatedByCharactersInSet:separators];


About Quinn McHenry

Quinn was one of the original co-founders of Tech-Recipes. He is currently crafting iOS applications as a senior developer at Small Planet Digital in Brooklyn, New York.
View more articles by Quinn McHenry

The Conversation

Follow the reactions below and share your own thoughts.