Collision occurs when
- Given key range is too large and a fraction of it will be used. There will be a group of valid key values close to each other, then a long invalid range, then a small group of valid keys and so on.
- Some kind of intelligence is provided in key (e.g. Roll Number) then entire range will never be used.
Resolving Collisions
Way 1 – Series of hash functions
If h1 (k1) = h2 (k2) then apply different hash function on key values as h2 (k1) and h2 (k2). Here they may not collide but this is not guaranteed. If again collision takes place then go for h3. Hence in this technique we need to have a series of hash function.
Insertion
Apply hash function on key1; h (key1) -> A1 and so on. As key2 comes; h (key2) -> A1.
Since there is a collision therefore move to next address space.
Way 2 – Linear Probing
Insertion
Apply hash function on key1; h (key1) -> A1 and so on. As key2 comes; h (key2) -> A1.
Since there is a collision therefore move to next address space.
If empty as (in case of redundancy spread across) then place the records with key2 Else read the key value of the record stored; let the key value be keyX. Apply hash function on keyX. If h (keyX) -> A1 then Go to next address space and repeat the same procedure Else Go to general overflow area and place the record at the next free address space End If End If
Theoretically this technique is good for insertion but searching sometimes gets very exhaustive.
Search
Let the key value of the record to be searched is keyX.
Apply hash function on keyX; h (keyX) -> AX Go to address space AX. If (Key of record stored at address space AX == KeyX) then Record found Else Go to next address space. If (Key of record stored at this new address space == KeyX) then Record found Else Apply hash function on this new key value; KeyY. If (h (KeyY) == AX) then Go to next address space and repeat the same procedure Else Go to general overflow area Traverse through the general overflow area, record by record, matching the key values, till the match is found or over flow area ends (this indicates record not found) End If End If End If
This is called as linear probing. First searching is main area, then entire overflow area and still if record is not found then the hashing feature of quickly figuring the address of record is washed out. Too large and full overflow area will slow down the search operation and makes the search clumsy.
Deletion
If keys get deleted then there will be spaces in between. After lots of deletion or when overflow area gets full and vacancies is somewhere in main area, then put records in appropriate place and clean the general overflow area.
If there is no redundancy (i.e. empty space in main area after the key which the key of record to be placed is collided) and overflow area too is full then
- Increase overflow area
- Change the hash function (This is better option)
If hash function is changed, the file needs to be reorganized completely. At the time of reorganization the file is not made available to any process. Large file will cost a lot of time due to large number of record reorganization. Hence this task should be performed at spare time.
Way 3 – Bucketing
Bucket |
A bucket is capable of holding more than one record. Within a bucket it is always a linear search. Larger buckets increase the cost of linear search. If you are certain that on an average three key values will land on same address space then define a bucket of size 3. The address will be the address of the bucket. After placing three keys if any further collision takes place
e.g. h (key4) -> A1 then create another bucket and set the pointer of the previous bucket to this new bucket.
Now the bucket can store 6 key values. If the new bucket also gets filled then repeat the same procedure of creating new bucket and setting the pointer. Hence a bucket is like a linked list. A bucket pointing to another bucket is like a node pointing to another node.
To get a record, apply hash function on the key value, go to the bucket with the resultant address, perform linear search and fetch the record.
As we allocated space for 3 records in a bucket but it holds only 1 record (key 4) then remaining space will be wasted. Hence another approach is to define a common overflow area. Its size should be greater than individual buckets. If bucket is full then instead of allocating a new bucket, place the record in this overflow area. While searching, if a record is not found in its bucket then land in overflow area and perform linear search.
No comments:
Post a Comment
Your comments are very much valuable for us. Thanks for giving your precious time.