Thursday, October 28, 2010

Replace word

Problem: Write a method to replaces all occurences a word in a string with another word. Assume the method signature is

   static String replaceWord(String original, String find, String replacement)

Note that Strings are immutable, so we must create a new String. The word "replace" is probably misleading. Also, the method is static because it doesn't reference instance variables. This allows some optimizations by the compiler.

A simple, but incorrect, solution


This first attempt only replaces one word, but we could fix that up later with a loop -- but it's not worth the effort because it's wrong anyway.

//--------------------------------------------- replaceWord() 
// BAD BAD BAD BAD BAD
static String replaceWord(String original, String find, String replacement) {
int i = original.indexOf(find);
if (i < 0) {
return original; // return original if 'find' is not in it.
}

String partBefore = original.substring(0, i);
String partAfter = original.substring(i + find.length());

return partBefore + replacement + partAfter;
}

This example method replaces the first occurrence of a one string with another. If we execute the following code,

   t = "A great man";
s = replaceWord(t, "man", "woman");

the value in s will be "A great woman". Ok, but what if we changed the orginal string to contain the word "human", or anything else that contained a substring, but not a separate word, that has "man" in it.

   t = "A great human";
s = replaceWord(t, "man", "woman");

Now s will contain "A great huwoman". Not what was intended.

The solution is to know about "words"


Simply finding a substring isn't sufficient. There are several ways to solve the problem.


  • Check the character preceding and following the string to see if they are delimiters, and can't be part of a word, or the word boundary is at the beginning or end of the string.
  • Use java.util.StringTokenizer. This is a common solution, but has been superceded by regular expressions (see below).
  • Java 1.4 added Regular Expressions, which is the most common way to solve the problem in most other programming languages. [TO DO: Write solution using regular expressions.]

StringTokenizer solution


This replaces all occurences of a "word", not just the first.

static String replaceAllWords1(String original, String find, String replacement) {
String result = "";
String delimiters = "+-*/(),. ";
StringTokenizer st = new StringTokenizer(original, delimiters, true);
while (st.hasMoreTokens()) {
String w = st.nextToken();
if (w.equals(find)) {
result = result + replacement;
} else {
result = result + w;
}
}
return result;
}

This breaks the string into tokens, consisting of the text between delimiters, as well as the tokens consisting of the delimiters. The true parameter causes the delimiters to also be returned as tokens. You might not need them if you are only interested in what comes between them. Naturally, you would choose the delimiters appropriate to your problem.

Buiding a string by sucessive concatenations is inefficient, as it creates new string as it is immutable and should be replaced by StringBuilder (or the equivalent pre-Java 5 StringBuffer) for efficiency. See below.

 

Using StringBuilder to build a String


Creating new strings in a loop can be inefficient because of the number of new objects that are created and discarded. It's better to use a StringBuilder (introduced Java 5) or the slightly slower, synchronized, StringBuffer, which has been in Java from the beginning.

static String replaceAllWords2(String original, String find, String replacement) {
StringBuilder result = new StringBuilder(original.length());
String delimiters = "+-*/(),. ";
StringTokenizer st = new StringTokenizer(original, delimiters, true);
while (st.hasMoreTokens()) {
String w = st.nextToken();
if (w.equals(find)) {
result.append(replacement);
} else {
result.append(w);
}
}
return result.toString();
}
This creates a StringBuilder with an initial capacity equal to the original string. It will expand if necessary, so this initial capacity isn't critical. The append(...) method is used to add to the end of it. The toString() call at the end returns a String. This is an efficient conversion.

No comments:

Post a Comment

Chitika