Wednesday, April 20, 2011

Locale-based or Natural Language based text comparison in java

The String class doesn't have the ability to compare text from a natural language perspective.

Its equals and compareTo methods compare the individual char values in the string. If the char value at index n in name1 is the same as the char value at index n in name2 for all n in both strings, the equals method returns true.

The java.text.Collator class provides natural language comparisons. Natural language comparisons depend upon locale-specific rules that determine the equality and ordering of characters in a particular writing system.A Collator object understands that people expect "cat" to come before "Hat" in a dictionary. Using a collator comparison, the following code prints cat < Hat.

Collator collator = Collator.getInstance(new Locale("en", "US")); 
//OR             Collator.getInstance(Locale.US);
int comparison = collator.compare("cat", "Hat");
if (comparison < 0) {
  System.out.printf("%s < %s\n", "cat", "Hat");
} else {
  System.out.printf("%s < %s\n", "Hat", "cat" );
}

So this can be used for sorting of words based on locale, eg using Collections.sort() :

List<String> boyNames= new ArrayList<String>();
boyNames.add("Ankit");
boyNames.add("Himanshu");
boyNames.add("Rohit");
boyNames.add("Neerav");
boyNames.add("Gaurav");

//
// Define a collator for US English.
//
Collator collator = Collator.getInstance(Locale.US);
//
// Sort the list base on the collator
//
Collections.sort(boyNames, collator);


References:

No comments:

Post a Comment

Chitika