You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2011/10/19 23:18:11 UTC

[Cassandra Wiki] Update of "ArchitectureCommitLog" by RickBranson

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "ArchitectureCommitLog" page has been changed by RickBranson:
http://wiki.apache.org/cassandra/ArchitectureCommitLog?action=diff&rev1=5&rev2=6

Comment:
Updated to reflect current CommitLog implementation as of 1.0.

- The !CommitLog class manages the !CommitLogSegments, each of which corresponds to a file on disk containing a fixed-size !CommitLogHeader followed by serialized !RowMutation objects.
+ The !CommitLog class manages the !CommitLogSegments, each of which corresponds to a file on disk containing serialized !RowMutation objects.
  
- A !CommitLogHeader has one entry per !ColumnFamily, consisting of a dirty bit and a replay offset, indicating the position in the !CommitLog file to start replaying the log for a particular !ColumnFamily.
+ The !CommitLogSegment keeps track of which column families have been modified in memory using a hash map called cfLastWrite. cfLastWrite has one entry per !ColumnFamily, consisting of an offset, indicating the position in the !CommitLog file where the last write took place for a particular !ColumnFamily.
  
- Each insertion (deletion) has to first write a log entry to the !CommitLog.
+ Each mutation has to first write a log entry to the !CommitLog.
  
-  * The writing of all log entries is handled by a single thread in !CommitLogExecutorService.
-  * For the first insert to a given !ColumnFamily CF in each !CommitLogSegment, the !CommitLogHeader is updated: the CF's dirty bit is turned on and the replay offset for CF in the !CommitLogHeader is updated with the current position (represented by a !CommitLogContext object) in the !CommitLog file.
+  * All log entries are written by a single thread in one of the !CommitLogExecutorService classes.
+  * For the first mutation to a given !ColumnFamily CF in each !CommitLogSegment, an entry is set in cfLastWrite map keyed by the CF's id containing the offset in the mutation was written at.
   * A !RowMutation entry is then appended to the !CommitLogSegment
-  * If !CommitLogSync is set to batch, the insertion further waits until the !CommitLogSegment is sync-ed to disk before the insert is allowed to proceed
+  * If the configuration directive !commitlog_sync is set to batch, the mutation further waits until the !CommitLogSegment is sync-ed to disk before the mutation is allowed to proceed
   * Once a !CommitLogSegment becomes too large, a new segment is created and new operations are appended there instead.
  
  On the completion of a flush for a !ColumnFamily CF,
  
+  * The !ReplayPosition for CF is written to the !SSTable metadata.
   * For each !CommitLogSegment F generated when or before the flush is initiated,
-   * If F is not the one being used when the flush was initated, the dirty bit for CF in the !CommitLogHeader of F is turned off
-    * If all dirty bits in the !CommitLogHeader are off, F is deleted.
-   * Otherwise, the dirty bit for CF in the !CommitLogHeader is turned on and the replay offset for CF is updated with the position in the log file when the flush was initiated.
+   * If F is not the one being used when the flush was initiated, the CF's entry in cfLastWrite is removed.
+    * If the cfLastWrite map is empty, the segment is no longer needed and is deleted.
+   * Otherwise, for the CF, the value is set in cfLastWrite map with the replay position when the flush was initiated (as long as no writes have taken place).
  
  Recovery during a restart,
  
   * Each !CommitLogSegment is iterated in ascending time order.
-  * The segment is read from the lowest replay offset among all entries in the !CommitLogHeader.
+  * The segment is read from the lowest replay offset among the !ReplayPositions read from the SSTable metadata.
-  * For each log entry read, the log is replayed for a !ColumnFamily CF if the position of the log entry is no less than the replay offset for CF in the !CommitLogHeader.
+  * For each log entry read, the log is replayed for a !ColumnFamily CF if the position of the log entry is no less than the !ReplayPosition for CF in the most recent !SSTable metadata.
   * When log replay is done, all Memtables are force flushed to disk and all commitlog segments are deleted.