- parsing
- dividing a string into tokens based on the given delimiters
- token
- one piece of information, a "word"
- delimiter
- one (or more) characters used to separate tokens
When there is just one character used as a delimiter
Example 1
We want to divide up a phrase into words where spaces are used to separate words. For examplethe music made it hard to concentrate
String phrase = "the music made it hard to concentrate";
String delims = "[ ]+";
String[] tokens = phrase.split(delims);
- the general form for specifying the delimiters that we will use is "[delim_characters]+" . (This form is a kind of regular expression. You don't need to know about regular expressions - just use the template shown here.) The plus sign (+) is used to indicate that consecutive delimiters should be treated as one.
- the split method returns an array containing the tokens (as strings). To see what the tokens are, just use a for loop:
You should find that there are seven tokens: the, music, made, it, hard, to, concentratefor (int i = 0; i < tokens.length; i++)
System.out.println(tokens[i]);
Example 2
Suppose each string contains an employee's last name, first name, employee ID#, and the number of hours worked for each day of the week, separated by commas. SoSmith,Katie,3014,,8.25,6.5,,,10.75,8.5
String employee = "Smith,Katie,3014,,8.25,6.5,,,10.75,8.5";
String delims = "[,]";
String[] tokens = employee.split(delims);
There is one small wrinkle to be aware of (regardless of how consecutive delimiters are handled): if the string starts with one (or more) delimiters, then the first token will be the empty string ("").
When there are several characters being used as delimiters
Example 3
Suppose we have a string containing several English sentences that uses only commas, periods, question marks, and exclamation points as punctuation. We wish to extract the individual words in the string (excluding the punctuation). In this situation we have several delimiters (the punctuation marks as well as spaces) and we want to treat consecutive delimiters as oneString str = "This is a sentence. This is a question, right? Yes! It is.";
String delims = "[ .,?!]+";
String[] tokens = str.split(delims);
Example 4
Suppose we are representing arithmetic expressions using strings and wish to parse out the operands (that is, use the arithmetic operators as delimiters). The arithmetic operators that we will allow are addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (^) and we will not allow parentheses (to make it a little simpler). This situation is not as straight-forward as it might seem. There are several characters that have a special meaning when they appear inside [ ]. The characters are ^ - [ and two &s in a row(&&). In order to use one of these characters, we need to put \\ in front of the character:String expr = "2*x^3 - 4/5*y + z^2";
String delims = "[+\\-*/\\^ ]+"; // so the delimiters are: + - * / ^ space
String[] tokens = expr.split(delims);
General template for using split
String s = string_to_parse;
String delims = "[delimiters]+"; // use + to treat consecutive delims as one;
// omit to treat consecutive delims separately
String[] tokens = s.split(delims);
No comments:
Post a Comment