Tuesday, 10 July 2012

Hash Function : Mid Square Method

In this method, when they key is of big size then its square will double the number of digits; of which we select few middle digits to form the address.

Example: 
94522 = 89340304: address is 3403

Why Middle Digit?

Last few digits of a squared roll number will be matching across classes (FY, SY and TY). Hence if we consider least significant digits then there will be collisions.

If we consider most significant digits then at times most significant digits

  • Will be matching for squares of 112010001, 112010002, …
  • Will be wide apart for squares of 102010001, 1120100201, and 122010001.
Hence when you consider least significant digits, there will be collision and when most significant digits are considered there might be collision along with wasting address due to sparse. 

No square will end on 2, 3, 7, and 8. The address spaces ending on 2, 3, 7, and 8 will never be used. This is another reason why we do not consider least significant digits.

The solution is – take a key, square it, chop 2-3 digits from each end and the middle remaining past will be the address. The chance of collision is pretty less. Hence the distribution will be stable, normalize and within a narrower range.

Loading Factor:

Loading factor is number of records in a file divided by maximum number of records that can be accommodated in a file. It will be a fraction and the value will be <=1.

E.g. If 50,000 records can be stored in address range of 1 million then the LF is 0.5

The advantage is that the file can grow beyond anticipation at later point of time.

Speeds:

In this method, square and chopping goes at electronic speed + disk access will be at electronic/mechanical speed (Mechanical because there will be rotation). 

Note: Electronic speed is greater than mechanical speed.

In B Tree we have 4 disk accesses but here only one disk access. Hence speed is fast in hashing.
The disadvantage of this method is there will be collisions but we can try to reduce collisions as much as possible. Another limitation is the size of the key.

No comments:

Post a Comment

Your comments are very much valuable for us. Thanks for giving your precious time.