Understanding PSQL's MVCC

Understanding PSQL's MVCC

PostgresSQL implements something called MVCC. Which alleviates the need for a Read-Lock in many cases. However when Glenn Franxman and I were trying to figure out how this works we were having trouble understanding it completely.

If you ask about MVCC in the #postgres irc channel on Freenode, they will direct you to the following three links

With SQL there are four transaction isolation levels, unfortunately the Wikipedia entry only describes one, serializable. So relying on the Wikipedia documentation to describe how Postgres' opperates is not completely accurate. I think that internally Postgres may operate the way that the Wikipedia entry describes, but uses a write-lock to emulate a "Read Commit" isolation level to prevent the transaction abort that would normally occur by one transaction writing and comitting data . I don't know if that's true or not.

Another confusing bit about Postgres is that Postgres only implements two levels, "Read Committed" and "Serializable" even though you can set the other isolation levels. In the documentation, the four transaction isolation levels refers to the SQL standard and not Postgres' actually functionality.

So when you look at the four transaction isolation levels, you'll notice that it says that a dirty read is possible if you set the isolation level to "READ UNCOMMITTED", however because "READ UNCOMMITTED" is the same as "READ COMMITTED" internally in Postgres, you can never get a dirty read even if you set the isolation level to READ UNCOMMITTED. That's ok because, as the documentation states, "the four isolation levels only define which phenomena must not happen, they do not define which phenomena must happen."

I'll run some examples of concurrent transaction to show you how the four levels act.

Read Committed

Transaction 1 Transaction 2
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# update mvcc_test set i = i + 1;
UPDATE 1
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
 
mvcc_test=# update mvcc_test set i = i + 1; -- 7
mvcc_test=# commit;
COMMIT
UPDATE 1
mvcc_test=# -- blocked until the other commited
 
mvcc_test=# commit;
COMMIT
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 3
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 3
(1 row)

Read Uncommitted

With Read Uncommited I observed the same behavior as Read Commited. This reflects the transaction documentation:

But internally, there are only two distinct isolation levels, which correspond to the levels Read Committed
and Serializable. When you select the level Read Uncommitted you really get Read Committed, and when you select
Repeatable Read you really get Serializable, so the actual isolation level might be stricter than what you select.
Transaction 1 Transaction 2
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ UNCOMMITTED;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ UNCOMMITTED;
SET
mvcc_test=# BEGIN; -- 1
BEGIN
mvcc_test=# BEGIN; -- 2.
BEGIN
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
 1 | 1
(1 row)
mvcc_test=# update mvcc_test set i = i + 1;
UPDATE 1
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
 
mvcc_test=# update mvcc_test set i = i + 1;
mvcc_test=# commit;
COMMIT
UPDATE 1
mvcc_test=# -- blocked until the other commited
 
mvcc_test=# commit;
COMMIT
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 3
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 3
(1 row)

SERIALIZABLE

Transaction 1 Transaction 2
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# SERIALIZABLE;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# SERIALIZABLE;
SET
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# update mvcc_test set i = i + 1;
UPDATE 1
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
 
mvcc_test=# update mvcc_test set i = i + 1;
mvcc_test=# commit;
COMMIT
ERROR:  could not serialize access due to
concurrent update
 
mvcc_test=# select * from mvcc_test;
ERROR:  current transaction is aborted, commands
ignored until end of transaction block
 
mvcc_test=# commit;
ROLLBACK
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)

Repeatable Read

Refering to the same paragraph in the transaction documentation:

But internally, there are only two distinct isolation levels, which correspond to the levels Read Committed
and Serializable. When you select the level Read Uncommitted you really get Read Committed, and when you select
Repeatable Read you really get Serializable, so the actual isolation level might be stricter than what you select.
<http://www.postgresql.org/docs/current/static/transaction-iso.html>

Repeatable Read is the same as Serializable.

Transaction 1 Transaction 2
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# REPEATABLE READ;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# REPEATABLE READ;
SET
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# update mvcc_test set i = i + 1;
UPDATE 1
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
 
mvcc_test=# update mvcc_test set i = i + 1;
mvcc_test=# commit;
COMMIT
ERROR:  could not serialize access due to
concurrent update
 
mvcc_test=# select * from mvcc_test;
ERROR:  current transaction is aborted, commands
ignored until end of transaction block
 
mvcc_test=# commit;
ROLLBACK
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)

What is possible in Isolation Levels

So, in the four transaction isolation levels table, it describes what is possible in the different isolation levels. Since I've demostrated that there are only two isolation levels in Postgresql. I'll demostrate the two read phenomena that could happen in both Serializable and Read Committed.

Let me explain what those two phenomena are. Here is a good explanation of read phenomena. I'm just simply repeating what is stated at that link

Nonrepeatable Read
S1 reads data which is later changed and commited by S2. If S1 reads the same data again (after S2's commit) and finds it to have changed or to be deleted (according to S2's changes), this is called a non-repeatable read. It is called non-repeatable because the same select statement doesn't return the same data (within the same transaction).
Phantom read
S1 reads data (select) with a specific where condition. After this read, S2 inserts some data that meets the S1's where condition and commits the inserted data. When S1 issues a select statement with the same where condition, it finds new records. It is called phantom read because the new records seem to be of phantom origin. A phantom read is thus a special case of a non-repeatable read.

Read Committed / Nonrepeatable Read Attempt

The SQL Standard says that with a Read Committed Isolation level, a "Nonrepeatable Read" is possible. Here is my attempt in proving that assertion.

Transaction 1 Transaction 2
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# update mvcc_test set i = i + 1;
UPDATE 1
--
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# commit;
COMMIT
--
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)

This is a Nonrepeatable Read. Because the data has change while inside a transaction.

Read Committed / Phantom Read Attempt

The SQL Standard says that with a Read Committed Isolation level, a "Phantom Read" is possible. Here is my attempt in proving that assertion.

mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# select * from mvcc_test
  where id = 2;
 id | i
----+---
(0 rows)
mvcc_test=# select * from mvcc_test
   where id = 2;
  id | i
 ----+---
 (0 rows)
mvcc_test=# insert into mvcc_test (id, i)
 values (2, 1);
INSERT 0 1
mvcc_test=# select * from mvcc_test
   where id = 2;
  id | i
 ----+---
 (0 rows)
mvcc_test=# commit;
COMMIT
mvcc_test=# select * from mvcc_test where id = 2;
 id | i
----+---
  2 | 1
(1 row)

This is a phantom read because the row poofed into existance inside the transaction when it didn't exist before.

Serializable / Nonrepeatable Read Attempt

The SQL Standard says that with a Serializable Isolation level, a "Nonrepeatable Read" is not possible. Here is my attempt in proving that assertion.

Transaction 1 Transaction 2
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# SERIALIZABLE;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# SERIALIZABLE;
SET
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# update mvcc_test set i = i + 1;
UPDATE 1
--
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)
mvcc_test=# commit;
COMMIT
--
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 2
(1 row)
mvcc_test=# select * from mvcc_test;
 id | i
----+---
  1 | 1
(1 row)

This is not a Nonrepeatable Read. Because the data did not change while inside a transaction.

Read Committed / Phantom Read Attempt

The SQL Standard says that with a Read Committed Isolation level, a "Phantom Read" is possible. Here is my attempt in proving that assertion.

mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# BEGIN;
BEGIN
mvcc_test=# SET TRANSACTION ISOLATION LEVEL
mvcc_test-# READ COMMITTED;
SET
mvcc_test=# select * from mvcc_test
  where id = 2;
 id | i
----+---
(0 rows)
mvcc_test=# select * from mvcc_test
   where id = 2;
  id | i
 ----+---
 (0 rows)
mvcc_test=# insert into mvcc_test (id, i)
 values (2, 1);
INSERT 0 1
mvcc_test=# select * from mvcc_test
   where id = 2;
  id | i
 ----+---
 (0 rows)
mvcc_test=# commit;
COMMIT
mvcc_test=# select * from mvcc_test where id = 2;
 id | i
----+---
(0 row)

This is a phantom read because the new row did not appear inside the transaction after it was commit in the first transaction.

Eric Moritz
: django / postgresql

Comments !