Sunday, May 1, 2011

String equality and interning in java

Strings in Java are objects, but resemble primitives (such as ints or chars) in that Java source code may contain String literals, and Strings may be concatenated using the “+” operator. These are convenient features, but the similarity of Strings to primitives sometimes causes confusion when Strings are compared.

As we saw here, how java deals with string comparisons. Lets understand the case 2, where == operator returns true for 2 different references having same values.
To save memory (and speed up testing for equality), Java supports “interning” of Strings. When the intern() method is invoked on a String, a lookup is performed on a table of interned Strings. If a String object with the same content is already in the table, a reference to the String in the table is returned. Otherwise, the String is added to the table and a reference to it is returned. The result is that after interning, all Strings with the same content will point to the same object. This saves space, and also allows the Strings to be compared using the == operator, which is much faster than comparison with the equals(Object) method.

Confusion can arise because Java automatically interns String literals. This means that in many cases, the == operator appears to work for Strings in the same way that it does for ints or other primitive values. Code written based on this assumption will fail in a potentially non-obvious way when the == operator is used to compare Strings with equal content but contained in different String instances.
Following test cases show how interning can be performed by java:
Consider following string i.e “A String":
 
String aString = "A String"; 



Case 1: Concatenated string


String aConcatentatedString = "A" + " " + "String";


aString == aConcatentatedString       : true
aString.equals(aConcatentatedString) : true

Case 2: Runtime string




String aRuntimeString = new String("A String");


aString == aConcatentatedString       : false
aString.equals(aConcatentatedString) : true


Case 3: Interned string


String anInternedString = aRuntimeString.intern();

aString == aConcatentatedString : true
aString.equals(aConcatentatedString) : true



Case 4: External strings , eg. 1st argument of main method


String firstArg = args[0];


aString == aConcatentatedString : false
aString.equals(aConcatentatedString) : true


Case 5: Using intern on external strings


String firstArgInterned = firstArg.intern();

aString == aConcatentatedString : true
aString.equals(aConcatentatedString) : true




So we can see that explicitly invoking intern() returns a reference to the interned String.

Similar to string, there is a pool of integers, bytes, etc and other value based classes like bigdecimal. See here for more on this.

No comments:

Post a Comment

Chitika