Transaction Isolation Levels & Concurrency Anomalies

An explanation of database isolation levels and common concurrency anomalies like write skew and phantom reads.

Introduction

Modern databases allow many transactions to run concurrently. Without proper coordination, concurrent reads and writes can lead to:

Inconsistent reads
Lost updates
Corrupted business logic
Violated constraints

This is where transaction isolation and concurrency control become critical.

Isolation defines: how and when the changes made by one transaction become visible to others.

Different isolation levels provide different guarantees, and allow different types of anomalies, balancing:

correctness
performance
concurrency

Concurrency Anomalies

Before understanding isolation levels, it helps to understand the problems databases are trying to prevent.

1. Dirty Read

A dirty read happens when a transaction reads data that has been written by another transaction that has not committed yet.

Example:

-- Transaction A
UPDATE accounts SET balance = 50 WHERE id = 1;
-- not committed yet
 
-- Transaction B
SELECT balance FROM accounts WHERE id = 1;
-- reads 50

If Transaction A later rolls back, Transaction B has read invalid data.

Why it is dangerous:

The uncommitted data might be rolled back
Leads to reading invalid or inconsistent data

How databases prevent it:

Do not expose uncommitted data to other transactions
Return the last committed version of the row instead (MVCC in many systems)

Prevented by:

Read Committed Isolation and stronger isolation levels

2. Dirty Write

A dirty write occurs when two transactions update the same row concurrently before either commits, and one overwrites the other before either has committed.

Example:

-- Transaction A
UPDATE accounts SET balance = 100 WHERE id = 1;
 
-- Transaction B
UPDATE accounts SET balance = 200 WHERE id = 1;

Without proper locking, one update may silently overwrite the other.

Why it is dangerous:

Causes lost updates
Final result depends on timing (race condition)
Violates data consistency

How databases prevent it:

Row-level write locks - only one transaction can modify a row at a time.
Two-Phase Locking (2PL) - ensures that locks are held until commit/rollback.

Trade-offs:

Can lead to blocking
May cause deadlocks (databases resolve them by aborting one transaction)

3. Non-Repeatable Read

A non-repeatable read occurs when the same query returns different results within the same transaction.

Example

-- Transaction A
SELECT balance FROM accounts WHERE id = 1;
-- returns 100
 
-- Transaction B
UPDATE accounts SET balance = 200 WHERE id = 1;
COMMIT;
 
-- Transaction A
SELECT balance FROM accounts WHERE id = 1;
-- now returns 200

Why it matters

The transaction cannot rely on previously read values remaining stable.

Prevented by

Snapshot Isolation
Repeatable Read
Serializable Isolation

4. Phantom Read

A phantom read occurs when a query returns a different set of rows during the same transaction.

Example:

-- Transaction A
SELECT * FROM orders WHERE amount > 1000;
-- returns 5 rows
 
-- Transaction B
INSERT INTO orders VALUES (..., 2000);
COMMIT;
 
-- Transaction A
SELECT * FROM orders WHERE amount > 1000;
-- now returns 6 rows

The additional row is called a phantom row.

Why it matters

Even if existing rows are unchanged, the result set itself changes.

Prevented by

Serializable isolation
Index range locking

5. Write Skew

A write skew occurs when two concurrent transactions read overlapping data and then update different rows based on those reads.

This is one of the most important anomalies allowed under Snapshot Isolation.

Example:

Assume: At least one doctor must remain on-call.

-- Transaction A
SELECT * FROM doctors WHERE on_call = true;
-- sees Dr. Alice and Dr. Bob
 
-- Transaction B
SELECT * FROM doctors WHERE on_call = true;
-- also sees Dr. Alice and Dr. Bob

Then:

-- Transaction A
UPDATE doctors SET on_call = false WHERE name = 'Alice';
 
-- Transaction B
UPDATE doctors SET on_call = false WHERE name = 'Bob';

Both transactions commit successfully.

Final result: No doctors are on call, violating the constraint.

Why it happens:

Both transactions read the same snapshot
They update different rows based on that snapshot
No direct write-write conflict occurs

Why locks don’t always help:

Locks typically apply to rows being modified
In write skew, transactions modify different rows
So no conflict is detected, even though a global constraint is violated

Solution:

Lock based on conditions, not just rows
Example: lock all rows that satisfy "doctors on call"

Prevented by:

Serializable isolation
Strict Two-Phase Locking (2PL)

6. Phantom Writes (Write Skew)

A special case related to phantoms is phantom writes, where two transactions insert new rows that conflict with each other.

Situation:

Two transactions run concurrently
Both check for a condition (e.g., "no booking exists for this time slot")
Since the row does not yet exist, no locks are held

What happens:

Both transactions believe the insert is valid
Both insert new rows
Result: constraint is violated (e.g., double booking) ❌

Why it happens:

The conflicting data doesn’t exist yet
Locks cannot be applied to something that isn’t there
Each transaction works on its own view of the data

Solution: Materialize Conflicts

Pre-create rows representing potential conflicts
Example: create all available meeting room slots in advance
Transactions then lock existing rows instead of inserting new ones

Isolation Levels

Isolation levels define how strongly a database protects transactions from concurrency anomalies.

From weaker to stronger:

Read Committed Isolation
Snapshot Isolation / Repeatable Read
Serializable Isolation (Serial Execution)

Higher isolation levels:

Improves correctness
Reduces concurrency
Usually increases locking or coordication overhead

Read Committed Isolation

Read Committed Isolation ensures a transaction only sees committed data.

Prevents:

Dirty reads (reading uncommitted data)

Still allows:

Non-repeatable reads
Phantom reads
Write skew

How it works:

Most databases implement Read Committed using:

MVCC (Multi-Version Concurrency Control) to provide a consistent view of committed data
Row-level locks for writes to prevent dirty writes

Each statement sees the latest committed data at the time it executes, but different statements within the same transaction may see different data if other transactions commit in between.

Example:

-- Transaction A
SELECT balance FROM accounts WHERE id = 1;
-- returns 100
 
-- Transaction B
UPDATE accounts SET balance = 200 WHERE id = 1;
COMMIT;
 
-- Transaction A
SELECT balance FROM accounts WHERE id = 1;
-- now returns 200

The second read sees newer committed data.

Snapshot Isolation (SI)

Snapshot Isolation provides each transaction with a consistent snapshot of the database at the time it starts.

Core Idea

Instead of reading live data, the transactions read a historical snapshot.

This is usually implemented using:

MVCC
Transaction timestamps
Row versions

How MVCC works:

Databases maintain multiple versions of rows.

Each version contains metadata such as:

The transaction ID that created it
The transaction ID that deleted it (if applicable)
Commit timestamps
Visibility rules determine which version a transaction sees based on its snapshot timestamp.

When a transaction starts,

It receives a snapshot timestamp
It only sees rows committed before that timestamp

This avoids many read locks and improves concurrency.

Prevents:

Dirty reads (reading uncommitted data)
Non-repeatable reads

Still allows:

Write skew
Some phantom-like anomalies (depending on implementation)

Example (inconsistent read across accounts):

-- Transaction A starts
SELECT balance FROM accounts WHERE id = 1;
-- returns 100
 
-- Transaction B
UPDATE accounts SET balance = 200 WHERE id = 1;
COMMIT;
 
-- Transaction A
SELECT balance FROM accounts WHERE id = 1;
-- still returns 100

In few words: under SI, we read a snapshot, not the live table.

Notes on implementation details:

MVCC is the key mechanism for SI (versioned rows with transaction IDs)
Write-Ahead Logging (WAL) ensures durability and atomicity, but it is not what provides snapshot isolation

Serializable Isolation

Serializable isolation is the strongest standard isolation level.

It guarantees that transactions behave as if they were executed one after another, even if they actually run concurrently.

What it prevents:

Dirty reads
Dirty writes
Non-repeatable reads
Phantom reads
Write skew

Serializable isolation provides the highest correctness guarantee.

Trade-off:

Serializable isolation may:

Reduce concurrency
Increase locking
Increase transaction aborts
Cause contention under heavy workloads

However, it is essential for:

Financial systems
Inventory systems
Critical consistency constraints

Two-Phase Locking (2PL)

One classical way databases achieve serializable behavior is through Two-Phase Locking.

2PL is a locking protocol that controls how locks are acquired and released.

The two phases:

Growing Phase: A transaction may acquire locks but cannot release any locks.
Shrinking Phase: Once a transaction releases its first lock, it cannot acquire any new locks.

This guarantees a consistent locking order, preventing cycles and ensuring serializability.

Example:

-- Transaction A
BEGIN;
 
SELECT * FROM accounts WHERE id = 1;
-- shared lock acquired
 
UPDATE accounts SET balance = 200 WHERE id = 1;
-- exclusive lock acquired
 
COMMIT;
-- locks released

Locks are held until the shrinking phase begins.

Strict Two-Phase Locking (Strict 2PL)

Most databases actually use Strict Two-Phase Locking.

Under Strict 2PL:

Exclusive locks are held until commit/rollback
Prevents dirty reads and dirty writes
Improves recoverability

This is commonly used in:

SQL Server
Traditional relational database systems

Advantages of 2PL:

Strong consistency guarantees
Prevents many concurrency anomalies
Ensures serializability

Disadvantages of 2PL:

Blocking, transactions may wait for locks
Deadlocks, two transactions may wait for each other forever.
Reduced concurrency, heavy locking reduces throughput.

Example:

Transaction A holds Row 1, waiting for Row 2
Transaction B holds Row 2, waiting for Row 1

Databases detect deadlocks and abort one transaction.

Index Range Locking

To prevent phantom reads, some databases use index range locking.

This allows transactions to lock a range of rows based on an index, preventing other transactions from inserting new rows that would satisfy the same query conditions.

The problem with row locking alone:

Suppose a transaction executes:

SELECT * FROM orders WHERE amount > 1000;

Even if all existing rows are locked:

Another transaction can still insert a new row with amount = 2000
The next query sees an additional row
Phantom read occurs

How index range locking works:

Instead of locking only rows:

The database locks:

Existing rows
The gaps between indexed values
Entire index ranges

This prevents inserts into the protected range.

Example:

Assume an index on amount.

Transaction A:

SELECT * FROM orders
WHERE amount BETWEEN 1000 AND 5000
FOR UPDATE;

The database may lock:

Rows inside the range
Gaps within the index range

Now Transaction B cannot insert:

INSERT INTO orders(amount) VALUES (3000);

until Transaction A commits.

Why it matters:

Index range locking is critical for:

Serializable isolation
Preventing phantom reads
Enforcing business constraints safely

Without range locking:

Row-level locking alone is insufficient
New rows can bypass existing locks

Materializing Conflicts

Materializing conflicts is a design pattern to prevent write skew and phantom writes by pre-creating rows that represent potential conflict points.

Why this works:

Now conflicts happen on existing rows
Locks can be applied
Prevents race conditions at the cost of extra design complexity

How to design it (practical tips):

Choose a conflict key: define what uniquely represents the constraint
- e.g., (room_id, timeslot), (doctor_id, shift_date), (user_id, day)
Pre-populate rows for all valid combinations
- e.g., generate all (room_id, timeslot) pairs for the next 30 days
Update instead of insert
- booking becomes: UPDATE slots SET booked_by = ? WHERE room_id = ? AND timeslot = ? AND booked_by IS NULL
Use row-level locks
- SELECT ... FOR UPDATE on the target row before updating
Enforce with constraints (belt and suspenders)
- unique indexes or check constraints to backstop logic

Example (meeting rooms):

Table room_slots(room_id, timeslot, booked_by)
Pre-create all rows with booked_by = NULL
Booking flow:
1. SELECT * FROM room_slots WHERE room_id=? AND timeslot=? FOR UPDATE
2. If booked_by IS NULL → UPDATE ... SET booked_by = user
3. Commit

Pros:

Works with weaker isolation levels (e.g., Read Committed Isolation)
Predictable locking behavior (on known rows)

Cons:

Requires upfront modeling of all possible conflicts
Can increase storage (many pre-created rows)
Less flexible for highly dynamic constraints

Alternatives (when available):

Unique indexes on computed keys (e.g., (room_id, timeslot))
Exclusion constraints (PostgreSQL) for range conflicts (e.g., time intervals)
Serializable isolation to let the database detect and abort conflicting transactions

Comparing Isolation Levels

Isolation Level	Dirty Reads	Non-repeatable Reads	Phantom Reads	Write Skew
Read Committed Isolation	✅ Prevented	❌ Possible	❌ Possible	❌ Possible
Snapshot Isolation	✅ Prevented	✅ Prevented	❌ Possible	❌ Possible
Serializable Isolation	✅ Prevented	✅ Prevented	✅ Prevented	✅ Prevented

MVCC vs Lock-Based Concurrency Control

Aspect	MVCC	Lock-Based
Read Behavior	Reads snapshots	Reads may wait for locks
Reader/Writer Blocking	Minimal	Common
Write Conflicts	Detected during commit or updates	Prevented through locks
Performance	Excellent for read-heavy systems	Good for controlled write-heavy systems
Storage Usage	Higher due to row versions	Lower
Deadlocks	Less common	More common
Phantom Prevention	Requires Serializable mechanisms	Uses range/index locks
Complexity	Higher implementation complexity	Easier conceptual model