You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Chen Luo (Jira)" <ji...@apache.org> on 2019/10/31 17:11:00 UTC

[jira] [Created] (ASTERIXDB-2666) One-Phase Log Replay Approach

Chen Luo created ASTERIXDB-2666:
-----------------------------------

             Summary: One-Phase Log Replay Approach
                 Key: ASTERIXDB-2666
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2666
             Project: Apache AsterixDB
          Issue Type: Wish
          Components: STO - Storage, TX - Transactions
            Reporter: Chen Luo


AsterixDB currently uses a classical two-phase log replay approach during recovery by first identifying committed writes and then applying these commit writes to LSM-trees. This is a stardard approach for general-purpose transaction processing systems, but for AsterixDB, we can design something better.

AsterixDB uses a record-level transaction model where each write is committed as soon as possible by "entity commit". To exploit this property, we can design a one-phase log replay approach as follows:
* Start from the log head based on the low watermark LSN
* Whenever we see an update log record, store that log record in memory (for each job)
* Whenever we see an entity commit or abort record, redo the corresponding update log record immediately and remove it from memory

The key property here is that the window between an update log record and a commit log record is very short - we commit on a frame basis. Thus, this will speed up the recovery process by only using one log read pass and avoiding store all entity commits in memory. We only need a small amount of memory, based on the window between updates and commits, during the recovery proces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)