Tuesday, 10 July 2012

Hashing : How to Resolve Collisions


Collision occurs when
  1. Given key range is too large and a fraction of it will be used. There will be a group of valid key values close to each other, then a long invalid range, then a small group of valid keys and so on.
  2. Some kind of intelligence is provided in key (e.g. Roll Number) then entire range will never be used.

Resolving Collisions

Way 1 – Series of hash functions

If h1 (k1) = h2 (k2) then apply different hash function on key values as h2 (k1) and h2 (k2). Here they may not collide but this is not guaranteed. If again collision takes place then go for h3. Hence in this technique we need to have a series of hash function.

Way 2 – Linear Probing




Insertion
Apply hash function on key1; h (key1) -> A1 and so on. As key2 comes; h (key2) -> A1. 

Since there is a collision therefore move to next address space.
If empty as (in case of redundancy spread across) then 
        place the records with key2 
Else
	read the key value of the record stored; let the key value be keyX.
	Apply hash function on keyX.
	If h (keyX) -> A1 then
		Go to next address space and repeat the same procedure
	Else
		Go to general overflow area and place the record at the next free address space
	End If
End If

Theoretically this technique is good for insertion but searching sometimes gets very exhaustive.

Search
Let the key value of the record to be searched is keyX.

Apply hash function on keyX; h (keyX) -> AX
Go to address space AX.
If (Key of record stored at address space AX == KeyX) then
	Record found
Else
	Go to next address space.
	If (Key of record stored at this new address space == KeyX) then
		Record found
	Else
		Apply hash function on this new key value; KeyY.
		If (h (KeyY) == AX) then
			Go to next address space and repeat the same procedure
		Else
			Go to general overflow area
Traverse through the general overflow area, record by record, matching the key values, till the match is found or over flow area ends (this indicates record not found)
		End If
	End If
End If

This is called as linear probing. First searching is main area, then entire overflow area and still if record is not found then the hashing feature of quickly figuring the address of record is washed out. Too large and full overflow area will slow down the search operation and makes the search clumsy.

Deletion

If keys get deleted then there will be spaces in between. After lots of deletion or when overflow area gets full and vacancies is somewhere in main area, then put records in appropriate place and clean the general overflow area.


If there is no redundancy (i.e. empty space in main area after the key which the key of record to be placed is collided) and overflow area too is full then

  1. Increase overflow area
  2. Change the hash function (This is better option)

If hash function is changed, the file needs to be reorganized completely. At the time of reorganization the file is not made available to any process. Large file will cost a lot of time due to large number of record reorganization. Hence this task should be performed at spare time.

Way 3 – Bucketing

Bucket
A bucket is capable of holding more than one record. Within a bucket it is always a linear search. Larger buckets increase the cost of linear search. If you are certain that on an average three key values will land on same address space then define a bucket of size 3. The address will be the address of the bucket. After placing three keys if any further collision takes place 

e.g. h (key4) -> A1 then create another bucket and set the pointer of the previous bucket to this new bucket.

Now the bucket can store 6 key values. If the new bucket also gets filled then repeat the same procedure of creating new bucket and setting the pointer. Hence a bucket is like a linked list. A bucket pointing to another bucket is like a node pointing to another node.

To get a record, apply hash function on the key value, go to the bucket with the resultant address, perform linear search and fetch the record.

As we allocated space for 3 records in a bucket but it holds only 1 record (key 4) then remaining space will be wasted. Hence another approach is to define a common overflow area. Its size should be greater than individual buckets. If bucket is full then instead of allocating a new bucket, place the record in this overflow area. While searching, if a record is not found in its bucket then land in overflow area and perform linear search.

No comments:

Post a Comment

Your comments are very much valuable for us. Thanks for giving your precious time.