Sunday, 13 July 2014

Database Transactions, Lock Management and Deadlock - Part 1

Transactions

A "transaction" is defined as any one execution of a user program in a DBMS. It is a set of actions on some data object.

"Data objects" are the units in which programs read or write information. The units could be pages, records, and so on, but this is dependent on the DBMS.

There are four important properties of transactions that a DBMS must ensure to maintain data in the face of concurrent access and system failures:

1. Users should be able to regard the execution of each transaction as atomic: do whole or do nothing. Either all actions are carried out or none are. Users should not have to worry about the effect of incomplete transactions (say, when a system crash occurs).

2. Each transaction, run by itself with no concurrent execution of other transactions, must preserve the consistency of the database. This property is called consistency, and the DBMS assumes that it holds for each transaction. Keeping transaction consistent is the responsibility of the person writing the transaction query. F you do not write a proper transaction (i.e. if the logic used is incorrect) then though the transaction will be atomic but will not be consistent.

3. Users should be able to understand a transaction without considering the effect of other concurrently executing transactions, even if the DBMS interleaves the actions of several transactions for performance reasons. This property is referred to as isolation: Transactions are isolated, or protected, from the effects of concurrently scheduling other transactions.

4. Once the DBMS informs the user that a transaction has been successfully completed, its effects should persist even if the system crashes before all its changes are reflected on disk. This property is called durability.

Schedule

A transaction is seen by the DBMS as a series, or list, of actions. The actions that can be executed by a transaction include reads and writes of database objects.

In addition to reading and writing, each transaction must specify as its final action either commit (i.e., complete successfully) or abort (i.e., terminate and undo all the actions carried out far).

A schedule is a list of actions (reading, writing, aborting, or committing) from a set of transactions, and the order in which two actions of a transaction T appear in a schedule must be the same as the order in which they appear in T. E.g. Let transaction T1 consist of set of actions (a, b, c, d) then b will always come after a and before c.

A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called a complete schedule.

If the actions of different transactions are not interleaved i.e. transactions are executed from start to finish, one by one then the schedule is called as a serial schedule.

We want to have effect of serial schedule in the light of concurrency. We want serializable schedule.

Due to OS’s short-term scheduler we cannot decide which process will be given priority, what will be the time slot how many statements will be executed and how much time each statement will take for execution. Briefly, you cannot control the pace with which transactions move ahead. Any schedule S is acceptable if it gives effect same as executing transactions in isolation.

A serializable schedule over a set S is a schedule whose effect on any consistent database instance is guaranteed to be identical to that of some complete serial schedule over S, (where S is a set of committed transactions). That is, the database instance that results from executing the given schedule is identical to the database instance that will result from executing the transactions in some serial order.

Anomalies

1.  Dirty Read (WR)

The first source of anomalies is that a transaction T2 could read a database object A that has been modified by another transaction T1, which has not yet committed. Such a read is called a dirty read.

E.g. consider two transactions

T1: Transfer Rs. 100/- from account A to account B.
T2: Calculate 1% interest on each account.

Suppose T1 was invoked by a low priority user and Y2 is run by a user with high priority. When T2 came, T1 was evicted and T2 was given full control. The complete schedule is as follows.

T1
T2
Comment
R(A)

T1 reads values of A. A = 1000
W(A)

T1 updates the value of A.
A = 900

R(A)
A = 900                       

W(A)
A = 909

R(B)
B = 400

W(B)
B = 404

Commit

R(B)

B = 404
W(B)

B = 504
Commit


And if T1 aborts then -
T1
T2
Comment
R(A)

T1 reads values of A. A = 1000
W(A)

T1 updates the value of A.
A = 900

R(A)
A = 900                       

W(A)
A = 909

R(B)
B = 400

W(B)
B = 404

Commit

Abort

A = 1009

The result of this schedule is different from any result that we would get by running one of the two transactions first and then the other. The problem can be traced to the fact that the value of A written by T1 is read by T2 before T1 has completed all its changes.

2. Unrepeatable Reads (RW Conflicts)

The second way in which anomalous behavior could result is that a transaction T2 could change the value of an object A that has been read by a transaction T1, while T1 is still in progress. This situation could not arise in a serial execution of two transactions; it is called an unrepeatable read.T1 is still in progress. If T1 tries to read the value of A again, it will get a different result, even though it has not modified A in the meantime.
Another problem is illustrated by this example.

Let T1: A = A + 1 and T1: A = A -1 1

T1
T2
Comment
R(A)

A = 5

R(A)
A = 5                           
W(A)

A = 6

W(A)
A = 4

Commit

Commit



If we execute T1 and T2 in any serial order then effectively the value of A remains the same but here it is different. Hence the database is not inconsistent state.
  
3. Overwriting Uncommitted Data (WW Conflicts)

This problem occurs when you loose data written by one of the transactions. It is lost and never recorded. Suppose that A and B are two employees, and their salaries must be kept equal.

Transaction T1 sets their salaries to $1,000 and transaction T2 sets their salaries to $2,000. Note that neither transaction reads a salary value before writing it, such a write is called a blind write.

If we execute these in the serial order T1 followed by T 2, both receive the salary $2,000; the serial order T2 followed by T 1 gives each the salary $1,000. Either of these is acceptable from a consistency standpoint.

Now, consider the following interleaving of the actions of T1 and T2.

T1
T2
Comment
W(A)

A = 1000

W(B)
B = 2000                     
W(B)

B = 1000

W(A)
A = 2000

Commit

Commit



The result is not identical to the result of either of the two possible serial executions, and the interleaved schedule is therefore not serializable. It violates the desired consistency criterion that the two salaries must be equal.

No comments:

Post a Comment

Your comments are very much valuable for us. Thanks for giving your precious time.