You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zookeeper.apache.org by fp...@apache.org on 2011/11/17 09:27:31 UTC

svn commit: r1203103 - in /zookeeper/bookkeeper/trunk: CHANGES.txt bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java doc/bookkeeperOverview.textile

Author: fpj
Date: Thu Nov 17 08:27:30 2011
New Revision: 1203103

URL: http://svn.apache.org/viewvc?rev=1203103&view=rev
Log:
BOOKKEEPER-109: Add documentation to describe how bookies flushes data (Sijie Guo via fpj)


Modified:
    zookeeper/bookkeeper/trunk/CHANGES.txt
    zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java
    zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile

Modified: zookeeper/bookkeeper/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/zookeeper/bookkeeper/trunk/CHANGES.txt?rev=1203103&r1=1203102&r2=1203103&view=diff
==============================================================================
--- zookeeper/bookkeeper/trunk/CHANGES.txt (original)
+++ zookeeper/bookkeeper/trunk/CHANGES.txt Thu Nov 17 08:27:30 2011
@@ -109,3 +109,5 @@ IMPROVEMENTS:
  hedwig-client/
 
   BOOKKEEPER-44: Reuse publish channel to default server to avoid too many connect requests to default server when lots of producers came in same time (Sijie Guo via breed)
+
+  BOOKKEEPER-109: Add documentation to describe how bookies flushes data (Sijie Guo via fpj)

Modified: zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java
URL: http://svn.apache.org/viewvc/zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java?rev=1203103&r1=1203102&r2=1203103&view=diff
==============================================================================
--- zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java (original)
+++ zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Bookie.java Thu Nov 17 08:27:30 2011
@@ -128,19 +128,19 @@ public class Bookie extends Thread {
      * <p>
      * Before flushing, SyncThread first records a log marker {journalId, journalPos} in memory,
      * which indicates entries before this log marker would be persisted to ledger files.
-     * Then sync thread begans flush ledger index pages to ledger index files, flush entry
+     * Then sync thread begins flushing ledger index pages to ledger index files, flush entry
      * logger to ensure all entries persisted to entry loggers for future reads.
      * </p>
      * <p>
      * After all data has been persisted to ledger index files and entry loggers, it is safe
      * to persist the log marker to disk. If bookie failed after persist log mark,
-     * bookie is able to relay journal entries started from last log mark without lossing
+     * bookie is able to relay journal entries started from last log mark without losing
      * any entries.
      * </p>
      * <p>
      * Those journal files whose id are less than the log id in last log mark, could be
      * removed safely after persisting last log mark. We provide a setting to let user keeping
-     * number of old journal files which may be used for munually recovery in critical disaster.
+     * number of old journal files which may be used for manual recovery in critical disaster.
      * </p>
      */
     class SyncThread extends Thread {
@@ -190,7 +190,7 @@ public class Bookie extends Thread {
                 }
                 lastLogMark.rollLog();
 
-                // list the journals whose has been marked
+                // list the journals that have been marked
                 List<Long> logs = listJournalIds(journalDirectory, new JournalIdFilter() {
                     @Override
                     public boolean accept(long journalId) {
@@ -769,7 +769,7 @@ public class Bookie extends Thread {
                             }
                             toFlush.clear();
 
-                            // check wether journal file is over file limit
+                            // check whether journal file is over file limit
                             if (bc.position() > MAX_JOURNAL_SIZE) {
                                 logFile.close();
                                 logFile = null;

Modified: zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile
URL: http://svn.apache.org/viewvc/zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile?rev=1203103&r1=1203102&r2=1203103&view=diff
==============================================================================
--- zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile (original)
+++ zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile Thu Nov 17 08:27:30 2011
@@ -116,3 +116,50 @@ p. The trick to making everything work i
 # Find the highest consecutively recorded entry, _LR_ ; 
 # Make sure that all entries between _LC_ and _LR_ are on a quorum of bookies; 
 
+h1. Data Management in Bookie Server
+
+h2. Basic
+
+p. Bookie servers manage data in a log-structured way, which is implemented using three kind of files:
+
+* _Journal_ : A journal file contains the BookKeeper transaction logs. Before any update takes place, a Bookie server ensures that a transaction describing the update is written to non-volatile storage. A new journal file is created once the Bookie server starts or the older journal file reaches the journal file size threshold.
+* _Entry Log_ : An entry log file manages the written entries received from BookKeeper clients. Entries from different ledgers are aggregated and written sequentially, while their offsets are kept as pointers in _LedgerCache_ for fast lookup. A new entry log file is created once the Bookie server starts or the older entry log file reaches the entry log size threshold. Old entry log files are removed by the _Garbage Collector Thread_ once they are not associated with any active ledger.
+* _Index File_ : An index file is created for each ledger, which comprises a header and several fixed-length index pages, recording the offsets of data stored in entry log files. 
+
+p. Since updating index files would introduce random disk I/O, for performance consideration, index files are updated lazily by a _Sync Thread_ running in the background. Before index pages are persisted to disk, they are gathered in _LedgerCache_ for lookup.
+
+* _LedgerCache_ : A memory pool caches ledger index pages, which more efficiently manage disk head scheduling.
+
+h2. Add Entry
+
+p. When a Bookie server receives entries from clients to be written, these entries will go through the following steps to be persisted to disk:
+
+# Append the entry in _Entry Log_, return its position { logId , offset } ;
+# Update the index of this entry in _Ledger Cache_ ;
+# Append a transaction of update of this entry in _Journal_ ;
+# Respond to BookKeeper client ;
+
+* For performance reasons, _Entry Log_ buffers entries in memory and commit them in batches, while _Ledger Cache_ holds index pages in memory and flushes them lazily. We will discuss data flush and how to ensure data integrity in the following section 'Data Flush'.
+
+h2. Data Flush
+
+p. Ledger index pages are flushed to index files in the following two cases:
+
+# _LedgerCache_ memory reaches its limit. There is no more space available to hold newer index pages. Dirty index pages will be evicted from _LedgerCache_ and persisted to index files.
+# A background thread _Sync Thread_ is responsible for flushing index pages from _LedgerCache_ to index files periodically.
+
+p. Besides flushing index pages, _Sync Thread_ is responsible for rolling journal files in case that journal files use too much disk space. 
+
+p. The data flush flow in _Sync Thread_ is as follows:
+
+# Records a _LastLogMark_ in memory. The _LastLogMark_ contains two parts: first one is _txnLogId_ (file id of a journal) and the second one is _txnLogPos_ (offset in a journal). The _LastLogMark_ indicates that those entries before it have been persisted to both index and entry log files.
+# Flushes dirty index pages from _LedgerCache_ to index file, and flushes entry log files to ensure all buffered entries in entry log files are persisted to disk.
+#* Ideally, a Bookie server just needs to flush index pages and entry log files that contains entries before _LastLogMark_. There is no such information in _LedgerCache_ and _Entry Log_ mapping to journal files, though. Consequently, the thread flushes _LedgerCache_ and _Entry Log_ entirely here, and may flush entries after the _LastLogMark_. Flushing more is not a problem, though, just redundant.
+# Persists _LastLogMark_ to disk, which means entries added before _LastLogMark_ whose entry data and index page were also persisted to disk. It is the time to safely remove journal files created earlier than _txnLogId_.
+#* If a Bookie server has crashed before persisting _LastLogMark_ to disk, it still has journal files containing entries for which index pages may not have been persisted. Consequently, when this Bookie server restarts, it inspects journal files to restore those entries; data isn't lost.
+
+p. Using the above data flush mechanism, it is safe for the _Sync Thread_ to skip data flushing when the Bookie server shuts down. However, in _Entry Logger_, it uses _BufferedChannel_ to write entries in batches and there might be data buffered in _BufferedChannel_ upon a shut down. Bookie server needs to ensure _Entry Logger_ flushes its buffered data during shutting down. Otherwise, _Entry Log_ files become corrupted with partial entries.
+
+p. As described above, _EntryLogger#flush_ is invoked in the following two cases:
+* in _Sync Thread_ : used to ensure entries added before _LastLogMark_ are persisted to disk.
+* in _ShutDown_ : used to ensure its buffered data persisted to disk to avoid data corruption with partial entries.