Wednesday, May 4, 2011

Regular Expression introduction in java

Regular expressions are sequences of characters and symbols that define a set of strings. They are useful for validating input and ensuring that data is in a particular format. For example, a ZIP code must consist of five digits, and a last name must contain only letters, spaces, apostrophes and hyphens.

Operations of Regular expression

There are various operations which can be performed with help of regular expressions:
  • Searching
  • Splitting expressions after searching the regex
  • Replacing expression with other expression at which regex matches
  • Counting the number of time regex is found in expression


Regular expressions helping symbols

Let X and Z be 2 regex to be searched.

Symbol Description
.X Matches any character
^X regex must match at the beginning of the line
X$ Finds regex must match at the end of the line
[abc] Set definition, can match the letter a or b or c. Note that it matches only 1 character.
[^abc] When a "^" appears as the first character inside [] when it negates the pattern. This can match any character except a or b or c
[abc[vz]] Set definition, can match a or b or c followed by either v or z
[a-d] Ranges between a and d…a,b,c,d. Its kind of inclusive range, where it includes a and d as well.
[a-d1-3] Ranges between a and d…a,b,c,d and numbers in range of 1-4, ie. 1,2,3,4
X|Z Finds X or Z
XZ Finds X directly followed by Z
$ Checks if a line end follows
Also java supports predefined patterns as well as quantifiers. Read here for more on this.

Built-in support for Regex with String in Java

Class String provides several methods for performing regular expression operations.
3 methods provides by strings are:
  • s.matches("regex")
  • s.split("regex")
  • s.replace("regex", "replacement")
matches() evaluates true  if the WHOLE string can be matched with string s.
split() creates array with substrings of s divided at occurrence of "regex". "regex" is not included in the result.
replace() replaces "regex" with "replacement.
See here for regular expressions with strings in java.

Using Pattern and Matcher class in regular expression
For advanced regular expressions the classes you java.util.regex.Pattern and java.util.regex.Matcher are used.
See here for Pattern class and here for Matcher class.

Following steps are followed to get regular expression matches in the text.
1. Compile the pattern
2. Use matcher object and perform various operations like find, group, replace, replaceAll.

String source = "hello mr. DJ, i like only PJ"; 
Pattern pattern = Pattern.compile("\\w+");
// In case you would like to ignore case sensitivity you could use this
// statement
// Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(source);
// Check all occurance
while (matcher.find()) {
        System.out.print("Start index: " + matcher.start());
        System.out.print(" End index: " + matcher.end() + " ");

Also see some regex examples.

No comments:

Post a Comment