Thursday, October 28, 2010

String to "tokens"

It's common to have to separate a string into separate "tokens". These tokens can be words, numbers, commands, or whatever. There are several ways to do this, but all of the good solutions use regular expressions.
  • String split(regex) - Probably the easiest.
  • java.util.Scanner - This can "read" from strings. Very general.
  • The java.util.regex. Pattern and Matcher use regular expressions - The most powerful solution.
  • java.util.StringTokenizer - This has been superceded by regular expressions, Scanner, and split().

Easy - String split(...) method

The easiest way to split a string into separate "tokens" is to use the split(...) method. For example,
String test = "It's the number 1 way.";
String[] tokens = test.split(" ");       // Single blank is the separator.
System.out.println(Arrays.toString(tokens));
Produces the following output:
[It's, the, number, 1, way.]

Good - Scanner

Scanner can be used to "read" strings. Here's the previous example using Scanner, but because it doesn't produce arrays, the results are added to an ArrayList.
String test = "It's the number 1 way.";
ArrayList<String> tokens = new ArrayList<String>();
  
Scanner tokenize = new Scanner(test);
while (tokenize.hasNext()) {
    tokens.add(tokenize.next());
}
System.out.println(tokens);
Produces the same output as above:
[It's, the, number, 1, way.]
It doesn't care about what makes up a "token", only that they must be separated by single blanks. To allow one or more blanks as a separator, use " +", which means one or more blanks.
Scanner has numerous methods for working more generally with regular expressions to identify the token you want to read or the delimiters you want to skip.
Numbers. One advantage of using a Scanner is that you can easily switch back and forth between reading strings and numbers.

No comments:

Post a Comment

Chitika