The general-purpose implementations are summarized in the table below. The table highlights their regular naming pattern: names are all of the form <
JDK 1.2 provides two implementations of each interface (with the exception of
The fact that the new implementations are unsynchronized represents a break with the past:
If you need a synchronized collection, the synchronization wrappers, described in the next section, allow any collection to be transformed into a synchronized collection. Thus, synchronization is optional for the new collection implementations where it was mandatory for the old.
As a rule of thumb, you should be thinking about the interfaces rather than the implementations. That is why there are no programming examples in this lesson. For the most part, the choice of implementation affects only performance. The preferred style, as was mentioned in the interfaces lesson, is to choose an implementation when a collection is created, and immediately assign the new collection to a variable of the corresponding interface type (or pass the collection to a method expecting an argument of the interface type). In this way, the program does not become dependent on any added methods in a given implementation, leaving the programmer or maintainer with the freedom to change implementations at the drop of a hat, if performance concerns dictate that it is the right thing to do.
The general purposes implementations are briefly discussed below. The performance of the implementations is described using words like constant, log, linear, n log(n) and quadratic. These words refer to the asymptotic upper bound on the time complexity of performing the operation. All of this is quite a mouthful, and it doesn't matter much if you don't know what it means. If you're interested, any good algorithms textbook explains this stuff. One thing to keep in mind is that this sort of performance metric has its limitations. Sometimes the nominally slower implementation may be faster for the collection size that you're actually using. When in doubt, measure the performance.
Implementation
> <Interface
>, where <Interface
> is the core collection interface implemented by the class, and <Implementation
> signifies the data structure underlying the implementation. Implementations | |||||
---|---|---|---|---|---|
Hash Table | Resizable Array | Balanced Tree | Linked List | ||
Interfaces | Set | HashSet | TreeSet | ||
List | ArrayList | LinkedList | |||
Map | HashMap | TreeMap |
Collection
, which has no direct implementations, but serves as a least common denominator for the other collection interfaces). In each case, one implementation is clearly the primary implementation: the one to use, all other things being equal. The primary implementations are HashSet
, ArrayList
and HashMap
. Note that the SortedSet
and SortedMap
interfaces do not have rows in the table above. Each of these interfaces has one implementation and these implementations (TreeSet
and TreeMap
) are listed in the Set
and Map
rows. Not only do the implementations have consistent names, but they have consistent behavior as well. All of them implement all the optional operations contained in their interfaces. All permit null
elements, keys and values. Each one is unsynchronized. All have fail-fast iterators, which detect illegal concurrent modification during iteration and fail quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. All are Serializable
, and all support a public clone
method. The fact that the new implementations are unsynchronized represents a break with the past:
Vector
and Hashtable
were synchronized in versions of the JDK prior to 1.2. The new approach was taken because it was recognized that collections are frequently used in a manner where the synchronization is of no benefit. Such uses include single-threaded use, read-only use, and use as part of a larger data object that does its own synchronization. In general, it is good API design practice not to make users pay for a feature that they generally don't use. Further, unnecessary synchronization can result in deadlock under certain circumstances. If you need a synchronized collection, the synchronization wrappers, described in the next section, allow any collection to be transformed into a synchronized collection. Thus, synchronization is optional for the new collection implementations where it was mandatory for the old.
As a rule of thumb, you should be thinking about the interfaces rather than the implementations. That is why there are no programming examples in this lesson. For the most part, the choice of implementation affects only performance. The preferred style, as was mentioned in the interfaces lesson, is to choose an implementation when a collection is created, and immediately assign the new collection to a variable of the corresponding interface type (or pass the collection to a method expecting an argument of the interface type). In this way, the program does not become dependent on any added methods in a given implementation, leaving the programmer or maintainer with the freedom to change implementations at the drop of a hat, if performance concerns dictate that it is the right thing to do.
The general purposes implementations are briefly discussed below. The performance of the implementations is described using words like constant, log, linear, n log(n) and quadratic. These words refer to the asymptotic upper bound on the time complexity of performing the operation. All of this is quite a mouthful, and it doesn't matter much if you don't know what it means. If you're interested, any good algorithms textbook explains this stuff. One thing to keep in mind is that this sort of performance metric has its limitations. Sometimes the nominally slower implementation may be faster for the collection size that you're actually using. When in doubt, measure the performance.
Set
The two general purpose Setimplementations are HashSetand TreeSet. It's very straightforward to decide which of these two to use.HashSet
is much faster (constant time vs. log time for most operations), but offers no ordering guarantees. If you need to use the operations in theSortedSet
, or in-order iteration is important to you, useTreeSet
. Otherwise, useHashSet
. It's a fair bet that you'll end up usingHashSet
most of the time. One thing worth keeping in mind aboutHashSet
is that iteration is linear in the sum of the number of entries and the number of buckets (the capacity). Thus, it's important to choose an appropriate initial capacity if iteration performance is important. Choosing a capacity that's too high can waste space as well as time. The default initial capacity is 101, and that's often more that you need. The initial capacity may be specified using theint
constructor. To allocate aHashSet
whose initial capacity is 17:
Set s= new HashSet(17);HashSet
s have one other "tuning parameter" called the load factor. If you care deeply about the space consumption of yourHashSet
, read theHashSet
documentation for more information. Otherwise just live with the default. If you accept the default load factor but you do want to specify an initial capacity, pick a number that's about twice the size that you expect theSet
to grow to. If your guess is way off, it may have to grow or you may waste a bit of space, but either way it's no big problem. If you know a prime number of about the right size, use it. If not, use an odd number. Or use an even number. It doesn't really matter much; these things might make theHashSet
perform a wee bit better, but nothing to write home about.
TreeSet
has no tuning parameters. With the exception ofclone
, neitherHashSet
norTreeSet
have any operations other than those required by their respective interfaces (Set
andTreeSet
).
List
The two general purposeList
implementations areArrayList
andLinkedList
. Most of the time, you'll probably useArrayList
. It offers constant time positional access, and it's just plain fast, because it does not have to allocate a node object for each element in theList
, and it can take advantage of the native methodSystem.arraycopy
when it has to move multiple elements at once. Think ofArrayList
asVector
without the synchronization overhead. If you frequently add elements to the beginning of theList
, or iterate over theList
deleting elements from its interior, you might want to considerLinkedList
. These operations are constant time in aLinkedList
but linear time in anArrayList
. But you pay a big price! Positional access is linear time in aLinkedList
and constant time in anArrayList
. Furthermore, the constant factor forLinkedList
is much worse. If you think that you want to use aLinkedList
, measure the performance with bothLinkedList
andArrayList
. You may be surprised.
ArrayList
has one tuning parameter, the initial capacity. It refers to the number of elements theArrayList
can hold before it has to grow. There's not much to say about it. The onlyArrayList
operations that are not required byList
areensureCapacity
andtrimToSize
(which alter the excess capacity), andclone
.
LinkedList
has no tuning parameters, and seven optional operations, one of which isclone
. The other six areaddFirst
,getFirst
,removeFirst
,addLast
,getLast
, andremoveLast
; I have very mixed feelings about them. They make it a bit more convenient to use aLinkedList
as a queue or a double-ended queue (dequeue), but they prevent you from easily switching representations when you discover thatArrayList
is faster.
If you need synchronization, aVector
will be slightly faster than anArrayList
synchronized withCollections.synchronizedList
, butVector
has loads of legacy operations, so be extra careful to always manipulate theVector
with theList
interface, or you'll be stuck with it for life.
If yourList
is fixed in size (that is, you'll never useremove
,add
or any of the bulk operations other thancontainsAll
) you have a third option that's definitely worth considering. SeeArrays.asList
in the convenience implementations section.
Map
The two general purposeMap
implementations areHashMap
andTreeMap
. The situation forMap
is exactly analogous toSet
. If you needSortedMap
operations or in-orderCollection
-view iteration, go forTreeMap
; otherwise, go forHashMap
. Everything else in theSet
section (above) also applies toMap
so just re-read it. Completeness requires that we mentionHashtable
. As withVector
andArrayList
, if you need synchronization, aHashtable
will be slightly faster than aHashMap
synchronized withCollections.synchronizedMap
. Again,Hashtable
has loads of legacy operations, so be extra careful always to manipulate it with theMap
interface, or you'll be stuck with it for life.
No comments:
Post a Comment