Saturday, 19 May 2012

File Organization Basics [Data Structures]

In data structure, file is a collection of records one after another and having fixed length.
  • Insertion – At the end;
  • Deletion – Anywhere;
  • Search – Record by record;
In Data Structure it is called as heap, in OS it is called as sequential file. OS visualizes file as a series of blocks as it is not concerned about the contents of the file. A block is generally of the size of sector defined on disk. Disk can be hard sector (predefined sectors generally 8) or soft sector (user defined sectors which we use). OS does everything (deleting or appending records) in terms of blocks.

If we consider a disk with 8 sectors then every track will have sectors numbered from 1 to 8. Contents of the file (block) is defines using track number - sector number combination. FAT maintains this entry indicating the ith block (bi) is stored at track n and sector j.

Space to a file cannot be allocated in advance as it may grow beyond expectations. Hence every time when a block is required OS looks for empty block (a mark is associated with the block indicating its availability). When an available block is encountered then it is given to the file, an entry is made in the FAT and the block’s entry is deleted from available list; and finally file contents are appended.

If the FAT is too long then locating a block with desired data will be an exhaustive search. Hence in UNIX, the first 10 entries of DAT are direct addresses (directly pointing to data blocks). The 11th block will store the address of the block holding addresses of other data blocks. The 12th block will point to a block, which is pointing to other blocks holding addresses of data blocks.

Direct
Single Indirect
Double Indirect
Triple Indirect
10
500
500 X 500 = 5002
500 X 500 X 500 = 5003

Note: The number 500 will differ from OS to OS, installation to installation, machine to machine. It is size of block/size of each address.

This system supports small size files and at the same time it can support large files too. It provides quick access to small files and at the same time supports big files with little reduced efficiency. The table with 10+3 entries has to be memory resident. The direct addressing blocks can be configured if average file size is smaller or larger than predefined 10 direct blocks. A 4th level of indirection can also be added using 14th block. Levels of indirection are added in incremental manner.

Hence the advantages of UNIX FAT over traditional FAT are
  1. FAT need to be memory resident and huge FAT can’t be accommodated.
  2. Searching through big size FAT – As it won’t fit on memory so you need to perform multiple disk accesses and at the end you may not be having the address you were looking for. We can’t spend too much time on just locating the address of a block.
Disk Access & Block Organization

As head passes through blocks, contents are read, loaded into disk buffer and then transferred to RAM. Once the disk buffer is full, it needs some time to get flushed before other data can be placed in it.



Physical Organization


Logical Organization


In this organization, blocks are stored one after another. The R/W head moves with a constant speed. After reading b1, the disk buffer is full and the R/W head is in b2 which is the next block to be read. As we need some time to empty the disk buffer hence this organizations is not efficient.

In this organization, blocks are placed at a distance of two blocks. After reading b1, the disk buffer is full. The R/W head has to skip two blocks and during this time the disk buffer can be flushed. Hence this is an efficient block organization.


Track no. is the thing that expects physical movement of head which occurs in a jumpy manner. The track no. access requests need to be optimized with the intention of minimizing head movement on disk. Hence different disk scheduling techniques are used. As the disk scheduling algorithms optimize the head movement therefore a request placed fists may be fulfilled later.

Record Number -> Block Number -> (Track Number, Sector Number) -> Disk



If records are of fixed size then a fixed n records can be placed on a block. Never ever split record across blocks. Now translate record no. to block no. using simple calculation. E.g. if 2 records are on each block then 9th record will be on 4th block.
  • B Tree: Key Value -> Record id (rid) -> Block id (bid) -> (Track number, Sector Number)
  • Hashing: h (Key Value) - > Record id (rid) -> Block id (bid) -> (Track number, Sector Number)
Logical & Physical Address Space – The record with key value 50 is logical view of record. rid to bid is logical address translation. Obtaining (Track Number, Sector Number) from bid is physical address translation.

In Hashing with bucketing the rid itself will be bid. Bucket Address = Block no. Hence h(key) -> bid -> ts.

No comments:

Post a Comment

Your comments are very much valuable for us. Thanks for giving your precious time.