You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by ju...@apache.org on 2014/07/31 09:53:34 UTC
svn commit: r1614819 -
/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segmentmk.md
Author: jukka
Date: Thu Jul 31 07:53:34 2014
New Revision: 1614819
URL: http://svn.apache.org/r1614819
Log:
OAK-1995: Improved SegmentNodeStore documentation
Add some TODOs for areas that could do with extra documentation
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segmentmk.md
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segmentmk.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segmentmk.md?rev=1614819&r1=1614818&r2=1614819&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segmentmk.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segmentmk.md Thu Jul 31 07:53:34 2014
@@ -20,7 +20,8 @@ SegmentMK design overview
The SegmentMK is an Oak storage backend that stores content as various
types of *records* within larger *segments*. One or more *journals* are
-used to track the latest state of the repository.
+used to track the latest state of the repository. In the TarMK implementation
+only one "root" journal is used.
The SegmentMK was designed from the ground up based on the following
key principles:
@@ -110,7 +111,7 @@ The segment header consists of the follo
| External blob record references (blobrefcount x 2 bytes) |
| |
| ...... +--------+--------+--------+
- | | padding (set o 0) |
+ | | padding (set to 0) |
+--------+--------+--------+--------+--------+--------+--------+--------+
The first four bytes of a segment always contain the ASCII string "0aK\n",
@@ -162,7 +163,7 @@ Journals are special, atomically updated
state of the repository as a sequence of references to successive
root node records.
-A small system could consist of just a single journal and would
+A small system (like TarMK) could use just a single journal and would
serialize all repository updates through atomic updates of that journal.
A larger system that needs more write throughput can have more journals,
linked to each other in a tree hierarchy. Commits to journals in lower
@@ -259,6 +260,8 @@ The result is a hierarchically stored im
can be accessed in O(log N) time and the size overhead of updating or
inserting list elements is also O(log N).
+TODO: Links to HAMT documentation
+
Value records
-------------
@@ -338,3 +341,18 @@ and child nodes. This way a node can bec
remain reasonably efficient to access and modify. The main downside of
this alternative storage layout is that the ordering of child nodes is
lost.
+
+TarMK
+=====
+
+TODO:
+
+- tar entry checksums
+- graph and index entries
+- recovery mechanism
+- tar generations / cleanup
+- journal.log
+- compaction
+- cleanup
+- backup
+- slow startup / journal.log