Showing posts with label set. Show all posts
Showing posts with label set. Show all posts

Sunday, May 29, 2011

Introduction to Sets in java

A Set(in the API reference documentation) is a Collection(in the API reference documentation) that cannot contain duplicate elements. Set models the mathematical set abstraction. The Set interface contains no methods other than those inherited from Collection. It adds the restriction that duplicate elements are prohibited. Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set objects with different implementation types to be compared meaningfully. Two Set objects are equal if they contain the same elements.

Wednesday, May 25, 2011

Operations on SortedSet

The operations that SortedSet inherits from Set behave identically on sorted sets and normal sets with two exceptions:
  • The Iterator returned by the iterator operation traverses the sorted set in order.
  • The array returned by toArray contains the sorted set's elements in order.
Although the interface doesn't guarantee it, the toString method of the JDK's SortedSet implementations returns a string containing all the elements of the sorted set, in order.

Tuesday, May 24, 2011

ConcurrentSkipList class in java

There are two new interfaces in Java 6 SE called "NavigableMap" and "NavigableSet" which facilitate navigating through collections. NavigableSet extends SortedSet and is implemented by TreeSet and concurrentSkipListSet (a new class in Java collection).
ConcurrentSkipListSet is one of the class that implements NavigableSet and it is used to return the closest matches of elements. It includes methods to return iterators in ascending and descending orders, as well as methods that return a sorted or navigable set for a special portion of data.

NavigableSet nset = new ConcurrentSkipListSet();

nset.add("20");

nset.add("60");

nset.add("50");

nset.add("40");

nset.add("30");

nset.add("80");
// Returns an iterator over the elements in navigable set,
//in ascending order.
Iterator iterator = nset.iterator();

//set in ascending order

while (iterator.hasNext())

{ //Ascending order list

System.out.print(iterator.next() + " ");

}

//Descending order list

System.out.println("In descending order : " + nset.descendingSet() + "\n");
//remove element
nset.pollLast();

Thursday, May 19, 2011

Range-view Operations on SortedSet

The Range-view operations are somewhat analogous to those provided by the List interface, but there is one big difference. Range-views of a sorted set remain valid even if the backing sorted set is modified directly. This is feasible because the endpoints of a range view of a sorted set are absolute points in the element-space, rather than specific elements in the backing collection (as is the case for lists). A range-view of a sorted set is really just a window onto whatever portion of the set lies in the designated part of the element-space. Changes to the range-view write back to the backing sorted set and vice-versa. Thus, it's OK to use range-views on sorted sets for long periods of time (unlike range-views on lists). Sorted sets provide three range-view operations. The first, subSet takes two endpoints (like subList). Rather than indices, the endpoints are objects. They must be comparable to the elements in the sorted set (using the set's Comparator or the natural ordering of its elements, whichever the set uses to order itself). Like subList the range is half-open, including its low endpoint but excluding the high one.
Thus, the following one-liner tells you how many words between "doorbell" and "pickle", including "doorbell" but excluding "pickle", are contained in a SortedSet of strings called dictionary:
int count = dictionary.subSet("doorbell", "pickl

See also post -Operations on SortedSet

Standard Constructors for SortedSet interface

By convention, all Collection implementations provide a standard constructor that takes a Collection, and SortedSet implementations are no exception. This constructor creates a SortedSet object that orders its elements according to their natural order. Additionally, by convention, SortedSet implementations provide two other standard constructors:

  • One that takes a Comparator and returns a new (empty) SortedSet sorted according to the specified Comparator.
  • One that takes a SortedSet and returns a new SortedSet containing the same elements as the given SortedSet, and sorted according to the same Comparator (or using the elements' natural ordering, if the specified SortedSet did too). Note that the compile-time type of the argument determines whether this constructor is invoked in preference to the ordinary Set constructor, and not the runtime type!
The first of these standard constructors is the normal way to create an empty SortedSet with an explicit Comparator. The second is similar in spirit to the standard Collection constructor: it creates a copy of a SortedSet with the same ordering, but with a programmer-specified implementation type.

Operations on SortedSet

The operations that SortedSet inherits from Set behave identically on sorted sets and normal sets with two exceptions:

  • The Iterator returned by the iterator operation traverses the sorted set in order.
  • The array returned by toArray contains the sorted set's elements in order.
Although the interface doesn't guarantee it, the toString method of the JDK's SortedSet implementations returns a string containing all the elements of the sorted set, in order.

See also post - RangeView operations on SortedSet

Set implementations in java

Being a Collection subtype all methods in the Collection interface are also available in the Set interface.
Since Set is an interface you need to instantiate a concrete implementation of the interface in order to use it. You can choose between the following Set implementations in the Java Collections API:
  • java.util.EnumSet
  • java.util.HashSet
  • java.util.LinkedHashSet
  • java.util.TreeSet
Each of these Set implementations behaves a little differently with respect to the order of the elements when iterating the Set, and the time (big O notation) it takes to insert and access elements in the sets

Wednesday, May 18, 2011

HashSet class in java

Introduction

  • Set interface implemented using a hash table
  • doesn't guarantee to iterate the elements in any specific order
  • constant time access to elements assuming good hash
    function

HashSet class Constructors

HashSet is implemented with an underlying HashMap. In addition to implemented the Set interface methods, HashSet has the following constructors.
Result Constructor Description
hset = new HashSet() Creates a new HashSet with default initial capacity 16 and load factor 0.75.
hset = new HashSet(initialCapacity) Creates a new HashSet with the specified initial int capacity.
hset = new HashSet(initialCapacity, loadFactor) Creates a new HashSet with the specified capacity which will not exceed a specified (float) load factor.
hset = new HashSet(coll) Creates a new HashSet with elements from the Collection coll

TreeSet class constructors in java

Introduction

  • Set interface implemented as a tree
  • Iterator return elements in natural order (see later)

Constructors of TreeSet Class

TreeSet implements the Set and SortedSet interface methods. TreeSet is implemented with an underlying TreeMap (balanced binary tree). If the element type has a natural order (eg, String) elements will be ordered by that, but often you will supply a Comparator object that tells how two elements compare. It has the following constructors.

Result Constructor Description
tset = new TreeSet() Creates new TreeSet. Elements sorted by natural order.
tset = new TreeSet(comp) Creates new TreeSet using Comparator comp to sort elements.
tset = new TreeSet(coll) Creates new TreeSet from Collection coll using natural ordering.
tset = new TreeSet(sset) Creates new TreeSet from SortedSet smp using element ordering from sset.

SortedSet interface methods in java

The SortedSet interface is used by TreeSet and adds additional methods to reflect that a TreeSet is sorted.
SortedSet sset;

Result Method Description
comp = sset.comparator() Returns Comparator used to compare elements. null if natural ordering used (eg, String).
obj = sset.firstKey() First element (in sorted order).
obj = sset.lastKey() Last element (in sorted order).
sset = sset.headMap(obj) Returns SortedSet of all elements less than obj.
sset = sset.tailMap(obj) Returns SortedSet of all elements greater than or equal to obj.
sset = sset.subMap(from, to) Returns SortedSet of all elements greater than or equal to from and less than to.

Sunday, May 15, 2011

Performance of Set interface implementations

HashSet

The HashSet class offers constant-time [ Big O Notation is O(1) ] performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

 

TreeSet

The TreeSet implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

 

LinkedHashSet

A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashSet, as iteration times for this class are unaffected by capacity.

Set & List interface extend Collection, but Map interface don't?

Map is considered part of collection framework, but it does not extend collection interface. The reason behind it is the design.But what design do we follow?

And the answer to this questions is best described in Sun's FAQ Page: This was by design. We feel that mappings are not collections and collections are not mappings. Thus, it makes little sense for Map to extend the Collection interface (or vice versa). If a Map is a Collection, what are the elements? The only reasonable answer is "Key-value pairs", but this provides a very limited (and not particularly useful) Map abstraction. You can't ask what value a given key maps to, nor can you delete the entry for a given key without knowing what value it maps to.

Collection could be made to extend Map, but this raises the question: what are the keys? There's no really satisfactory answer, and forcing one leads to an unnatural interface. Maps can be viewed as Collections (of keys, values, or pairs), and this fact is reflected in the three "Collection view operations" on Maps (keySet, entrySet, and values). While it is, in principle, possible to view a List as a Map mapping indices to elements, this has the nasty property that deleting an element from the List changes the Key associated with every element before the deleted element. That's why we don't have a map view operation on Lists.

Saturday, May 14, 2011

Implementing set using arrays

We can implement tree by arrays. Likewise, this program shows how to implement set using arrays:

public class Set<T> {

private T arrayElement[];

int size =0;

public Set(){

this.arrayElement = null;

}

public Set(T[] element){

arrayElement = element;

size = arrayElement.length;

}

/**

*add element to set. A check is made to identify whether element is present or not.

*If not the element can be inserted.

* @param element

*/

public void addElement(T element){

if(!contains(element)){

if(size == arrayElement.length){

incrementArray();

}

arrayElement[size++] = element;

}

}



/**

* to check is element is present or not.

* @param elem

* @return boolean

*/

public boolean contains(T elem){



if (elem == null) {

for (int i = 0; i < size; i++)

if (arrayElement[i]==null)

return true;

} else {

for (int i = 0; i < size; i++)

if (elem.equals(arrayElement[i]))

return true;

}

return false;



}



/**

* return the size of set.

* @return int

*/

public int size(){

if(arrayElement != null){

return arrayElement.length;

}else

return 0;

}



public void clear(){

arrayElement = null;

}



public String toString(){

if(arrayElement == null || arrayElement.length ==0 ){

return“[EMPTY]“;

}else{

String toStr=”[";

for(int i=0;i<arrayElement.length;i++){

toStr+=arrayElement[i]+”,”;

}

toStr+=”]”;

return toStr;

}

}



/**

* to check whether set is empty or not

* @return

*/

public boolean isEmpty(){

if(arrayElement == null || arrayElement.length ==0 )

return true;

else

return false;

}



/**

* this function is used to increment the size of an array

*

*/

private void incrementArray(){

T[] temparray = arrayElement;

int tempsize=size+5;

arrayElement =(T[]) new Object[tempsize];

System.arraycopy(temparray, 0, arrayElement, 0, size);



}

}//Set class ends

Monday, April 18, 2011

Getting keys from hashtable: Iterating over hashtable

Consider the hashtable:

Hashtable ht = new Hashtable(); 
ht.put("1","One"); 
ht.put("2","Two"); 
ht.put("3","Three");

Now getting the keys set can be got as follows:

As a set
Set st = ht.keySet();
 
    System.out.println("Set created from Hashtable Keys contains :");
    //iterate through the Set of keys
    Iterator itr = st.iterator();
    while(itr.hasNext())
      System.out.println(itr.next());
Note:
Please note that resultant Set object is backed by the Hashtable.
Any key that is removed from Set will also be removed from
original Hashtable object. The same is not the case with the element addition.
Eg.
st.remove("2");

Getting from enum:
Enumeration e = ht.keys();
    while(e.hasMoreElements())
      System.out.println(e.nextElement());
  }


When to use which set implementation?

HashSet:
HashSet is backed by a HashMap instance and hence it allows the null element. It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

When do you use HashSet?
When you are looking for performance, use HashSet. Since this class uses the hash function when retrieving the elements, it allows fast retrieval. This class offers constant time performance for the basic operations add, remove, contains and size, assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance the number of buckets. Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

LinkedHashSet:
LinkedHashSet is backed by LinkedHashMap. So all the elements in LinkedHashSet are actually the keys in the LinkedHashMap. It maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). This class provides all of the optional Set operations, and permits null elements.

When do you use LinkedHashSet?
When order of insertion matter. Example
When you are looking to produce a copy of a set that has the same order as the original, regardless of the original set's implementation without incurring the increased cost (associated with TreeSet).

void foo(Set s) {
    Set copy = new LinkedHashSet(s);
    ...
}

This technique is particularly useful if a module takes a set on input, copies it, and later returns results whose order is determined by that of the copy. Like HashSet, it provides constant-time performance for the basic operations add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.

A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashSet, as iteration times for this class are unaffected by capacity.

TreeSet:
TreeSet is backed by TreeMap instance and TreeSet wont permit null elements. As opposing to HashSet, TreeSet provides a total ordering on its elements and especially when you need a sorted order. The elements are ordered either by using their plain Comparable natural ordering (if you dont pass any parameter for the TreeSet constructor) or by a Comparator typically provided at sorted set creation time as a parameter to the constructor. All elements inserted into this set must either implement the Comparable interface or atleast be accepted by the specified comparator. So all such elements must be mutually comparable i.e., e1.compareTo(e2) (or comparator.compare(e1, e2)) must not throw a ClassCastException.

When do you use TreeSet?
Its very well explained above that TreeSet will be used when the sorted order of inserted elements is required.
What about the performance? The performance obviously will be low compared to HashSet. A TreeSet may be accessed and traversed in either ascending or descending order. The descendingSet method returns a view of the set with the senses of all relational and directional methods inverted. The performance of ascending operations and views is likely to be faster than that of descending ones.

EnumSet:
EnumSet as the name itself says is a specialized set for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created. Enum sets are represented internally as bit vectors. This representation is extremely compact and efficient. The space and time performance of this class should be good enough to allow its use as a high-quality, typesafe alternative to traditional int-based "bit flags." Even bulk operations (such as containsAll and retainAll) should run very quickly if their argument is also an enum set. All basic operations execute in constant time. They are likely (though not guaranteed) to be much faster than their HashSet counterparts. Even bulk operations execute in constant time if their argument is also an enum set.
The iterator returned by the iterator method traverses the elements in their natural order (the order in which the enum constants are declared). The returned iterator is weakly consistent: it will never throw ConcurrentModificationException and it may or may not show the effects of any modifications to the set that occur while the iteration is in progress.

Null elements are not permitted in EnumSet. Attempts to insert a null element will throw NullPointerException. Attempts to test for the presence of a null element or to remove one will, however, function properly.
EnumSet internally uses RegularEnumSet class, a private implementation class for EnumSet, for "regular sized" enum types i.e., those with 64 or fewer enum constants while it used JumboEnumSet class if the enum constants are more than 64 which is another private implementation class for EnumSet.

Iterators returned by set implementation and concurrent modification

Except EnumSet, the iterators returned by the implemented class's (HashSet/TreeSet) iterator method are fail-fast i.e., if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. But again it is not guaranteed to throw "ConcurrentModificationException" by iterator all the time but will be thrown on best-effort basis. So dont rely on this exception for the correctness of the program.

Collections and serializablity

Collections don't extend Serializable interface but its implementations do.Similar note go for other interfaces like sets, lists.

Set doesn't extends Serializable interface but its implementations (HashSet, TreeSet, EnumSet) implements Serializable interface and so contains methods writeObject and readObject to save or reconstitute the state of Set (HashSet, TreeSet, EnumSet) instance to/from a stream. When you serialize this instance, writeObject and readObject methods will be called.

Collections like Vector and Hashtable both implements Serializable and have been designed for serialization.

One thing to keep in mind is to watch out for is, that in order to serialize a collection like Vector or Hashtable, you must also be able to serialize all of the objects contained in these collections. Otherwise, the collection would not be able to be completely restored. Your program will throw a NotSerializableException unless all objects stored in the Vector or Hashtable are also serializable.

Eg.
If your class implements Serializable and all of its members are serializable, then the object can be serialized correctly. Example:

public class Person implements Serializable {
    private String name;
    private Collection<Integer> medals = new ArrayList<Integer>();
}


As long as medals's instance is serializable (such as ArrayList), and its members are serializable (in this case Integers) , then the object will serialize.



Set tutorial in java

A Set(in the API reference documentation) is a Collection(in the API reference documentation) that cannot contain duplicate elements. Set models the mathematical set abstraction. The Set interface contains no methods other than those inherited from Collection. It adds the restriction that duplicate elements are prohibited. Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set objects with different implementation types to be compared meaningfully. Two Set objects are equal if they contain the same elements.
 

Set Interfaces

 

Set Implementations

 

Operations on Sets

 

SortedSet

 

How is set implemented internally?

Set and serializations

Sets and synchronization

Iterators returned by set implementation and concurrent modification

When to use which set implementation?

Performance of set interface implementations

 

How is set implemented in java?

You know what a "Set" is in java. Its an interface which extends collection interface. It contains no duplicate elements i.e., no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. All is fine and cool until here.
But the question is how set maintains unique elements? Here is how. Internally set maintains a Map when a Set (HashSet/Treeset/etc) is instantiated. See below code of HashSet constructor.

/**
 * Constructs a new, empty set; the backing HashMap instance has
 * default initial capacity (16) and load factor (0.75).
 */
public HashSet() {
    map = new HashMap();
}

So HashSet creates an instance of HashMap when HashSet is instantiated, TreeSet creates an instance of TreeMap when TreeSet is instantiated and so on. The keys in the map are the elements you add to Set. Then what are values? Values are dummy objects. When you add an element to the Set, the element will be added as key and "new Object()" (dummy Object instance) will be added as value which is a dummay value. What will happen if you add a duplicate object to Set? If the set already contains the element, the call to add method leaves the set unchanged and returns false. If the set does not contain the element, the call to add method adds the element and returns true.

Thursday, March 31, 2011

Getting Synchronized set

By default set is not synchronized. If multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method like below :

Hashset :

HashSet hashSet = new HashSet();
Set set = Collections.synchronizedSet(hashSet);

TreeSet :

TreeSet
treeSet = new TreeSet(); 
Set set = Collections.synchronizedSet(treeSet);

Chitika