Sunday, March 13, 2011

Multimaps

A multimap is like a map, except it can map each key to multiple values. The Collections Framework doesn't include an interface for multimaps, because they aren't used all that commonly. It's a fairly simple matter to use a Map whose values are List objects as a multimap. This technique is demonstrated in the next code example, which reads a dictionary containing one word per line (all lowercase) and prints out all of the permutation groups in the dictionary that meet a size criterion. A permutation group is a bunch of words all of which contain exactly the same letters, but in a different order. The program takes two arguments on the command line: the name of the dictionary file, and the minimum size of permutation group to print out. Permutation groups containing fewer words than the specified minimum are not printed.
There is a standard trick for finding permutation groups: for each word in the dictionary, alphabetize the letters in the word (that is, reorder the word's letters into alphabetical order) and put an entry into a multimap, mapping the alphabetized word to the original word. For example, the word "bad" causes an entry mapping "abd" into "bad" to be put into the multimap. A moment's reflection will show that all of the words to which any given key maps form a permutation group. It's a simple matter to iterate over the keys in the multimap, printing out each permutation group that meets the size constraint.
The following program is a straightforward implementation of this technique. The only tricky part is the alphabetize method, which returns a string containing the same characters as its argument, in alphabetical order. This routine (which has nothing to do with the Collections Framework) implements a slick bucket sort. It assumes that the word being alphabetized consists entirely of lowercase alphabetic characters.
import java.util.*;
import java.io.*;

public class Perm {
    public static void main(String[] args) {
        int minGroupSize = Integer.parseInt(args[1]);
 
        // Read words from file and put into simulated multimap
        Map m = new HashMap();
        try {
            BufferedReader in =
                   new BufferedReader(new FileReader(args[0]));
            String word;
            while((word = in.readLine()) != null) {
                String alpha = alphabetize(word);
                List l = (List) m.get(alpha);
                if (l==null)
                    m.put(alpha, l=new ArrayList());
                l.add(word);
            }
        } catch(IOException e) {
            System.err.println(e);
            System.exit(1);
        }

        // Print all permutation groups above size threshold
        for (Iterator i = m.values().iterator(); i.hasNext(); ) {
            List l = (List) i.next();
            if (l.size() >= minGroupSize)
                System.out.println(l.size() + ": " + l);
        }
    }

    private static String alphabetize(String s) {
        int count[] = new int[256];
        int len = s.length();
        for (int i=0; i<len; i++)
            count[s.charAt(i)]++;
        StringBuffer result = new StringBuffer(len);
        for (char c='a'; c<='z'; c++)
            for (int i=0; i<count[c]; i++)
                result.append(c);
        return result.toString();
    }
}
Running the program on an 80,000 word dictionary takes about 16 seconds on an aging UltraSparc 1. With a minimum permutation group size of eight, it produces the following output:
% java Perm dictionary.txt 8

 9: [estrin, inerts, insert, inters, niters, nitres, sinter,
     triens, trines]
 8: [carets, cartes, caster, caters, crates, reacts, recast, traces]
 9: [capers, crapes, escarp, pacers, parsec, recaps, scrape, secpar,
     spacer]
 8: [ates, east, eats, etas, sate, seat, seta, teas]
12: [apers, apres, asper, pares, parse, pears, prase, presa, rapes,
     reaps, spare, spear]
 9: [anestri, antsier, nastier, ratines, retains, retinas, retsina,
     stainer, stearin]
10: [least, setal, slate, stale, steal, stela, taels, tales, teals,
     tesla]
 8: [arles, earls, lares, laser, lears, rales, reals, seral]
 8: [lapse, leaps, pales, peals, pleas, salep, sepal, spale]
 8: [aspers, parses, passer, prases, repass, spares, sparse, spears]
 8: [earings, erasing, gainers, reagins, regains, reginas, searing,
     seringa]
11: [alerts, alters, artels, estral, laster, ratels, salter, slater,
     staler, stelar, talers]
 9: [palest, palets, pastel, petals, plates, pleats, septal, staple,
     tepals]
 8: [enters, nester, renest, rentes, resent, tenser, ternes, treens]
 8: [peris, piers, pries, prise, ripes, speir, spier, spire]
Many of these words seem a bit bogus, but that's not the program's fault; they're in the dictionary file.

No comments:

Post a Comment

Chitika