Wednesday, 6 September 2017

Java Hashmap Implementation


  • Hashmap stores key value pairs
  • HashMap uses hash to insert and get elements. hash = key.hashCode()%BUCKET_ARRAY_SIZE]
  • Its an array of Entry LinkedList and each LinkedList identifies using hash. LinkedList will be stored in hash index of BUCKET_ARRAY
  • Entries become LinkedList by keeping next entry in current entry.


Steps To create Hashmap
 Entry Class
     It  Make an Entry class to store the HashMap Entries.
Variable: key, value and next
All enties having the same hash[key.hashCode()%BUCKET_ARRAY_SIZE] linked together using next property having the type Entry. next will holds next entry.Always append new entry in last of linked List.
 Put
Finding the bucket(Linked List) using hash[key.hashCode()%BUCKET_ARRAY_SIZE]
if entry is null in bucket then create new Entry and add to array with key.hashCode()%BUCKET_ARRAY_SIZE as key
if bucket already contains entry and key already existing then get entry and override value
if bucket already exists and key not contains then attach new entry at the end of the linkedList[looping till next become null]
 Get
Finding the bucket(LinkedList) using hash[key.hashCode()%BUCKET_ARRAY_SIZE]
Loop entry till entry become and null and return value if key exists while looping
      Entry.java
package com.sk.custom_hashmap; public class Entry { private final K key; private V value; private Entry next; public Entry(K k, V v) { key = k; value = v; } public V getValue() { return value; } public void setValue(V value) { this.value = value; } public Entry getNext() { return next; } public void setNext(Entry next) { this.next = next; } public K getKey() { return key; } }
SkHashMap.java
package com.sk.custom_hashmap; import java.util.HashSet; import java.util.Set; public class SkHashMap { private int BUKET_ARRAY_SIZE = 16;// initial capacity private double LOAD_FACTOR = 0.75; // no.of entries/BUKET_ARRAY_SIZE private Entry[] BUKET_ARRAY = new Entry[BUKET_ARRAY_SIZE]; @SuppressWarnings(value = { "unchecked" }) public void put(K key, V value) { int hash = key.hashCode() % BUKET_ARRAY_SIZE; // finding bucket Entry entry = BUKET_ARRAY[hash]; if (null == entry) { /** * creating a new Entry and add to array if the hash is not present * in bucket array. */ BUKET_ARRAY[hash] = new Entry(key, value); } else { /** * Overrides value if key already exists */ if (entry.getKey().equals(key)) { entry.setValue(value); } else { /** * need to append new Entry in the end.. So traverse across * Entrys till Entry.next ==null */ while (entry.getNext() != null) { entry = entry.getNext(); } entry.setNext(new Entry(key, value)); } } } @SuppressWarnings(value = { "unchecked", "unused" }) public V get(K key) { V value = null; int hash = key.hashCode() % BUKET_ARRAY_SIZE; Entry entry = BUKET_ARRAY[hash]; while (null != entry) { if (entry.getKey().equals(key)) { return entry.getValue(); } entry = entry.getNext(); } return null; } @SuppressWarnings(value = { "unchecked" }) public Set> getEntrySet() { Set> set = new HashSet>(); for (Entry enty : BUKET_ARRAY) { Entry entrry = enty; while (entrry != null) { set.add(entrry); entrry = entrry.getNext(); } } return set; } }
Load Factor
  • Load Factor is a measure, which decides when exactly to increase the hashmap capacity(no.of buckets) to maintain get and put operation complexity of O(1).
  • Default load factor of Hashmap is 0.75 (i.e 75% of current map size).
  • Default capacity of Hashmap is 2^4 = 16 buckets.
  • Let say we have well implemented hashcode() method, which make sure that key-value pair will be well distributed across 16 buckets equally.
  • So, If there are 16 items in hashmap, then good hashcode method will distribute 1 item in each bucket. Searching for any item in this case will take only 1 look up.Now, If there are 32 items in hashmap, then good hashcode method will distribute 2 item in each bucket. Searching for any item in this case will take maximum 2 look up.Now, If there are 128 items in hashmap, then good hashcode method will distribute 8 item in each bucket. Searching for any item in this case will take maximum 8 look up.
  • If the amount of item keeps on increasing and the number of buckets are fixed(16) then at one time, performance of hashmap will start degrading due to large number of items in each bucket.
  • Now, say if there are 5,00,000 items in hashmap, then, good hashcode method will distribute 31,250 items in each bucket. Searching for any item in this case, will require maximum 31,250 look up.
  • If we increase the total number of buckets, when total items in each bucket starts increasing, then we will be able to keep constant number of items in each bucket and maintain the time complexity of O(1) for get and put operation. The decision of "When to increase the number of buckets" is decided by Load Factor.

  • Default, initial capacity of the HashMap is 16 and Load factor is 0.75
  • Load factor = Number of entries in a bucket / capacity(no. of buckets)
  • It is advisable to have a load factor of around 0.75 for keeping the put and get complexity around O(1).
  • Load Factor and Initial Capacity(Number of buckets) can be configured while creation of Hashmap like shown below,
    HashMap m = new HashMap(int initialCapacity, float loadFactor);
  • Rehashing
    • Rehashing is the process of re-calculating the hashcode of already stored entries (Key-Value pairs), to move them to another bigger size hashmap when Load factor threshold is reached.
    • When the number of items in map, crosses the Load factor limit at that time hashmap doubles its capacity and hashcode is re-calculated of already stored elements for even distribution of key-value pairs across new buckets.
    HashMap Collision Occur 

    • There are several class in JDK which are based upon the hash table data structure
      • HashMap,
      • LinkedHashMap,
      • Hashtable,
      • WeakHashMap,
      • IdentityHashMap,
      • ConcurrentHashMap, 
      • TreeMap, and
      • EnumMap.
    • Underlying working of all these Map is pretty much same as discussed in How does HashMap internally works in Java, except some minor differences in their specific behaviors. Since hash table data structure is subject to collision all these implementations are required to handle the collision.
    • A collision will occur when two different keys have the same hashCode, which can happen because two unequal objects in Java can have the same hashCode.
    • HashMap handles collision by using linked list to store map entries ended up in same array location or bucket location
    • From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced binary tree in place of linked list to handle frequently hash collisions. The idea is to switch to the balanced tree once the number of items in a hash bucket grows beyond a certain threshold. This will improve the worst case get() method performance from O(n) to O(log n).
    • Legacy class Hashtable which exists in JDK from Java 1 will not use the balanced binary tree to handle frequent hash collision to keep its iteration order intact. This was decided to avoid breaking many legacy Java application which depends upon iteration order of Hashtable.
    • Apart from Hashtable, WeakHashMap and IdentityHashMap will also continue to use the linked list for handling collision even in the case of frequent collisions.
    • TreeMap
      • Treemap class is like HashMap which stores key- value pairs . The major difference is that Treemap  sorts the key in ascending order.
      • It sorts the TreeMap object keys using Red-Black tree algorithm.
      • We should read the pseudo code of Red Black algorithm in order to understand the internal implementation 
        • As the name of the algorithm suggests ,color of every node in the tree is either red or black.
        • Root node must be Black in color.
        • Red node can not have a red color neighbor node.
        • All paths from root node to the null should consist the same number of black nodes 
        • Rotations maintains the inorder ordering of the keys(x,y,z).
        • A rotation can be maintained in O(1) time.

    • Why  java's  treemap does not allow an initial size ?
      • HashMap reallocates its internals as the new one gets inserted while TreeMap does not reallocate nodes on adding new ones. Thus , the size of the TreeMap  dynamically increases if needed , without shuffling the internals. So it is meaningless to set the initial size of the TreeMap .
    • ConcurrentHashMap 
      • public ConcurrentHashMap (int initialCapacity, float loadFactor, int concurrencyLevel)
      • So the above line  creates a new, empty map with the specified initial capacity, load factor and concurrency level.
      • initialCapacity - the initial capacity. The implementation performs internal sizing to accommodate this many elements.
      • concurrencyLevel - the estimated number of concurrently updating threads. The implementation performs internal sizing to try to accommodate this many threads
      • In the ConcurrentHashMap Api , you will find the following constants.
        • static final int DEFAULT_INITIAL_CAPACITY = 16;
        • static final int DEFAULT_CONCURRENCY_LEVEL = 16;
        •  Instead of a map wide lock, ConcurrentHashMap maintains  a list of 16 locks by default ( number of locks equal to the initial capacity , which is by default  16) each of which is used to lock on a single bucket of the Map.

    Download Source

    No comments:

    Post a Comment

    Search This Blog

    Contact us

    Name

    Email *

    Message *