Tuesday, November 15, 2011

HashMap and Hashtable


Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.)

This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

How Hash Map Works:

HashMap works on principle of hashing, we have put () and get () method for storing and retrieving object from Hash Map. When we pass both key and value to put() method to store on HashMap, it uses key object hashcode() method to calculate hash code and then by applying hashing on that hash code it identifies bucket location for storing value object.

While retrieving it uses key object equals method to find out correct key value pair and return value object associated with that key.

HashMap uses linked list in case of collision and object will be stored in next node of linked list. Also hash map stores both key value tuple in every node of linked list.

What will happen if two different HashMap key objects have same hashcode?

They will be stored in same bucket but in the next node of linked list. And keys equals () method will be used to identify correct key value pair in HashMap.

 Difference between HashMap and Hashtable:
  • The HashMap class is roughly equivalent to Hashtable, except that it is non synchronized and permits nulls. (HashMap allows null values as key and value whereas Hashtable doesn't allow nulls).
  • HashMap does not guarantee that the order of the map will remain constant over time.
  • HashMap is non synchronized whereas Hashtable is synchronized.
  • Iterator in the HashMap is  fail-fast  while the enumerator for the Hashtable is not and throw ConcurrentModificationException if any other Thread modifies the map structurally  by adding or removing any element except Iterator's own remove()  method. But this is not aguaranteed behavior and will be done by JVM on best effort.

Hashtable Performance :

To get better performance from your java Hashtable, you need to use the following while instantiating a Hashtable: 
  •  use the initialCapacity and loadFactor arguments
  •  use them wisely
InitialCapacitiy is the number of buckets to be created at the time of Hashtable instantiation. The number of buckets and probability of collision is inversly proportional. If you have more number of buckets than needed then you have lesser possibility for a collision.

For example, if you are going to store 10 elements and if you are going to have initialCapacity as 100 then you will have 100 buckets. You are going to calculate hashCoe() only 10 times with a spectrum of 100 buckets. The possibility of a collision is very very less.

But if you are going to supply initialCapacity for the Hashtable as 10, then the possibility of collision is very large. loadFactor decides when to automatically increase the size of the Hashtable. The default size of initialCapacity is 11 and loadFactor is .75 That if the Hashtable is 3/4 th full then the size of the Hashtable is increased.

New capacity in java Hashtable is calculated as follows:
int newCapacity = oldCapacity * 2 + 1;

If you give a lesser capacity and loadfactor and often it does the rehash() which will cause you performance issues. Therefore for efficient performance for Hashtable in java, give initialCapacity as 25% extra than you need and loadFactor as 0.75 when you instantiate.

ConcurrentHashMap:

It is a hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. This class obeys the same functional specification a Hashtable, and includes versions of methods corresponding to each method of Hashtable.

However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.

Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.

 For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Additionally the iterators and enumerators does not throw ConcurrentModificationException. However iterators are designed to be used by only one thread at a time.

Like Hashtable but unlike HashMap; this class does not allow NULL to be used as either Key or Value.

Difference between Hashtable and ConcurrentHashMap :

Both can be used in multithreaded environment but once the size of hashtable becomes considerable large performance degrade because for iteration it has to be locked for longer duration.

Since ConcurrentHashMap introduced concept of segmentation, how large it becomes only certain part of it gets locked to provide thread safety so many other readers can still access map without waiting for iteration to complete.

In Summary ConcurrentHashMap only locks certain portion of Map while Hashtable locks full map while doing iteration. 



No comments:

Post a Comment