You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by th...@apache.org on 2018/08/21 10:31:38 UTC
svn commit: r1838538 [4/35] - in /jackrabbit/site/live/oak/docs: ./ architecture/ coldstandby/ features/ nodestore/ nodestore/document/ nodestore/segment/ oak-mongo-js/ oak-mongo-js/fonts/ oak-mongo-js/scripts/ oak-mongo-js/scripts/prettify/ oak-mongo-...

Modified: jackrabbit/site/live/oak/docs/nodestore/overview.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/overview.html?rev=1838538&r1=1838537&r2=1838538&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/overview.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/overview.html Tue Aug 21 10:31:37 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-10 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-02-21 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180810" />
+    <meta name="Date-Revision-yyyymmdd" content="20180221" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Node Storage</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -52,7 +52,6 @@
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Main APIs <b class="caret"></b></a>
         <ul class="dropdown-menu">
             <li><a href="http://www.day.com/specs/jcr/2.0/index.html" title="JCR API">JCR API</a></li>
-            <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" title="Jackrabbit API">Jackrabbit API</a></li>
             <li><a href="../oak_api/overview.html" title="Oak API">Oak API</a></li>
         </ul>
       </li>
@@ -137,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-08-10<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-02-21<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -156,7 +155,6 @@
     <li><a href="../architecture/nodestate.html" title="The Node State Model"><span class="none"></span>The Node State Model</a>  </li>
           <li class="nav-header">Main APIs</li>
     <li><a href="http://www.day.com/specs/jcr/2.0/index.html" class="externalLink" title="JCR API"><span class="none"></span>JCR API</a>  </li>
-    <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" class="externalLink" title="Jackrabbit API"><span class="none"></span>Jackrabbit API</a>  </li>
     <li><a href="../oak_api/overview.html" title="Oak API"><span class="none"></span>Oak API</a>  </li>
           <li class="nav-header">Features and Plugins</li>
     <li class="active"><a href="#"><span class="icon-chevron-down"></span>Node Storage</a>
@@ -164,7 +162,6 @@
       <ul class="nav nav-list">
     <li><a href="../nodestore/documentmk.html" title="Document NodeStore"><span class="icon-chevron-down"></span>Document NodeStore</a>
       <ul class="nav nav-list">
-    <li><a href="../nodestore/document/mongo-document-store.html" title="MongoDB DocumentStore"><span class="none"></span>MongoDB DocumentStore</a>  </li>
     <li><a href="../nodestore/document/node-bundling.html" title="Node Bundling"><span class="none"></span>Node Bundling</a>  </li>
     <li><a href="../nodestore/document/secondary-store.html" title="Secondary Store"><span class="none"></span>Secondary Store</a>  </li>
     <li><a href="../nodestore/persistent-cache.html" title="Persistent Cache"><span class="none"></span>Persistent Cache</a>  </li>
@@ -243,8 +240,7 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  -->
-<h1>Node Storage</h1>
+  --><h1>Node Storage</h1>
 <p>Oak comes with two node storage flavours: <a href="segment/overview.html">Segment</a> and <a href="documentmk.html">Document</a>. Segment storage is optimised for maximal performance in standalone deployments, and document storage is optimised for maximal scalability in clustered deployments.</p>
 <div class="section">
 <h2><a name="NodeStore_API"></a>NodeStore API</h2>

Modified: jackrabbit/site/live/oak/docs/nodestore/persistent-cache.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/persistent-cache.html?rev=1838538&r1=1838537&r2=1838538&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/persistent-cache.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/persistent-cache.html Tue Aug 21 10:31:37 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-10 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-03-27 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180810" />
+    <meta name="Date-Revision-yyyymmdd" content="20180327" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Persistent Cache</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -52,7 +52,6 @@
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Main APIs <b class="caret"></b></a>
         <ul class="dropdown-menu">
             <li><a href="http://www.day.com/specs/jcr/2.0/index.html" title="JCR API">JCR API</a></li>
-            <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" title="Jackrabbit API">Jackrabbit API</a></li>
             <li><a href="../oak_api/overview.html" title="Oak API">Oak API</a></li>
         </ul>
       </li>
@@ -137,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-08-10<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-03-27<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -156,14 +155,12 @@
     <li><a href="../architecture/nodestate.html" title="The Node State Model"><span class="none"></span>The Node State Model</a>  </li>
           <li class="nav-header">Main APIs</li>
     <li><a href="http://www.day.com/specs/jcr/2.0/index.html" class="externalLink" title="JCR API"><span class="none"></span>JCR API</a>  </li>
-    <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" class="externalLink" title="Jackrabbit API"><span class="none"></span>Jackrabbit API</a>  </li>
     <li><a href="../oak_api/overview.html" title="Oak API"><span class="none"></span>Oak API</a>  </li>
           <li class="nav-header">Features and Plugins</li>
     <li><a href="../nodestore/overview.html" title="Node Storage"><span class="icon-chevron-down"></span>Node Storage</a>
       <ul class="nav nav-list">
     <li><a href="../nodestore/documentmk.html" title="Document NodeStore"><span class="icon-chevron-down"></span>Document NodeStore</a>
       <ul class="nav nav-list">
-    <li><a href="../nodestore/document/mongo-document-store.html" title="MongoDB DocumentStore"><span class="none"></span>MongoDB DocumentStore</a>  </li>
     <li><a href="../nodestore/document/node-bundling.html" title="Node Bundling"><span class="none"></span>Node Bundling</a>  </li>
     <li><a href="../nodestore/document/secondary-store.html" title="Secondary Store"><span class="none"></span>Secondary Store</a>  </li>
     <li class="active"><a href="#"><span class="none"></span>Persistent Cache</a>
@@ -243,99 +240,80 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  -->
-<div class="section">
+  --><div class="section">
 <h2><a name="Persistent_Cache"></a>Persistent Cache</h2>
 <p>The document storage optionally uses the persistent cache. The cache acts like an in-memory cache for old revisions, but in addition to keeping the most recently used nodes in memory, it also stores them to disk. That way, many reads from the storage backend (for example MongoDB) are replaced by reads from the local disk. This is specially useful if reads from the local disk are faster than reads from the storage backend. In addition to that, the persistent cache reduces the load on the storage backend.</p>
 <div class="section">
 <h3><a name="aOSGi_Configuration"></a>&#xa0;OSGi Configuration</h3>
 <p>The default OSGi configuration of the persistent cache is:</p>
 
-<div>
-<div>
-<pre class="source">org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService
+<div class="source">
+<div class="source"><pre class="prettyprint">org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService
     persistentCache=&quot;cache&quot;
 </pre></div></div>
-
 <p>Oak version up to 1.4 have the persistent cache disabled by default, which is equivalent with a configuration entry set to an empty String. Starting with Oak 1.6, the persistent cache is enabled by default and can be disabled by setting the configuration entry to <tt>&quot;-&quot;</tt>.</p></div>
 <div class="section">
 <h3><a name="Configuration_Options"></a>Configuration Options</h3>
 <p>The persistent cache configuration setting is string with a number of comma separated elements. The first element is the directory where the cache is stored. Example:</p>
 
-<div>
-<div>
-<pre class="source">&quot;cache&quot;
+<div class="source">
+<div class="source"><pre class="prettyprint">&quot;cache&quot;
 </pre></div></div>
-
 <p>In this case, the data is stored in the directory &#x201c;cache&#x201d;, relative to the <tt>repository.home</tt> directory. If no repository home directory is configured, the directory is relative to the current working directory. Oak versions prior to 1.6 always resolve to the current working directory and ignore the <tt>repository.home</tt> configuration. By default, there are at most two files (two generations) with the name &#x201c;cache-x.data&#x201d;, where x is an incrementing number (0, 1,&#x2026;). A file is at most 1 GB by default. If the file is larger, the next file is created, and if there are more than two files, the oldest one is removed. If data from the older file is accessed, it is copied to the latest file. That way, data that is not recently read will eventually be removed.</p>
 <p>The following other configuration options are available:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>Size. A file is at most 1 GB by default. To change maximum size of a file, use &#x201c;size=x&#x201d;, where x is the size in MB.</p>
-</li>
+<p>Size. A file is at most 1 GB by default. To change maximum size of a file, use &#x201c;size=x&#x201d;, where x is the size in MB.</p></li>
+  
 <li>
-
-<p>Node caching. By default, nodes at all revisions are cached. To disable this option, use &#x201c;-nodes&#x201d;.</p>
-</li>
+<p>Node caching. By default, nodes at all revisions are cached. To disable this option, use &#x201c;-nodes&#x201d;.</p></li>
+  
 <li>
-
-<p>Children caching. By default, the list of children of a node is cached. To disable this option, use &#x201c;-children&#x201d;.</p>
-</li>
+<p>Children caching. By default, the list of children of a node is cached. To disable this option, use &#x201c;-children&#x201d;.</p></li>
+  
 <li>
-
-<p>Diff caching. By default, the list of differences between two revisions is cached. To disable this option, use &#x201c;-diff&#x201d;.</p>
-</li>
+<p>Diff caching. By default, the list of differences between two revisions is cached. To disable this option, use &#x201c;-diff&#x201d;.</p></li>
+  
 <li>
-
-<p>Compaction. The cache file can be compacted and compressed (at a rate of around 100 MB per second) when it is closed. That way, the disk space is used more efficiently. To enable this option, use &#x201c;+compact&#x201d;. (Please note this feature was enabled by default in versions 1.2.1, 1.0.13, and older.)</p>
-</li>
+<p>Compaction. The cache file can be compacted and compressed (at a rate of around 100 MB per second) when it is closed. That way, the disk space is used more efficiently. To enable this option, use &#x201c;+compact&#x201d;. (Please note this feature was enabled by default in versions 1.2.1, 1.0.13, and older.)</p></li>
+  
 <li>
-
-<p>Compression. By default, the cache is compressed, saving space. To disable this option, use &#x201c;-compress&#x201d;.</p>
-</li>
+<p>Compression. By default, the cache is compressed, saving space. To disable this option, use &#x201c;-compress&#x201d;.</p></li>
+  
 <li>
-
-<p>Binary caching (removed in Oak 1.10). When using the BlobStore, binaries smaller than 1 MB are stored in the persistent cache by default. The maximum size can be changed using the setting &#x201c;binary=x&#x201d;, where x is the size in bytes. To disable the binary cache, use &#x201c;binary=0&#x201d;.</p>
-</li>
+<p>Binary caching (removed in Oak 1.10). When using the BlobStore, binaries smaller than 1 MB are stored in the persistent cache by default. The maximum size can be changed using the setting &#x201c;binary=x&#x201d;, where x is the size in bytes. To disable the binary cache, use &#x201c;binary=0&#x201d;.</p></li>
 </ul>
 <p>Those setting can be appended to the persistent cache configuration string. An example configuration is:</p>
 
-<div>
-<div>
-<pre class="source">&quot;cache,size\=2048,-compact,-compress&quot;
+<div class="source">
+<div class="source"><pre class="prettyprint">&quot;cache,size\=2048,-compact,-compress&quot;
 </pre></div></div>
-
 <p>To disable the persistent cache entirely in Oak 1.6 and newer, use the following configuration:</p>
 
-<div>
-<div>
-<pre class="source">org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService
+<div class="source">
+<div class="source"><pre class="prettyprint">org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService
     persistentCache=&quot;-&quot;
 </pre></div></div>
-
 <p>Up to Oak version 1.4, either omit the persistentCache entry or set it to an empty String to disable the persistent cache.</p></div>
 <div class="section">
 <h3><a name="Journal_cache"></a>Journal cache</h3>
 <p>Since Oak 1.6.</p>
 <p>Diff cache entries can also are stored in a separate persistent cache and configured independently if needed. This can be done in the OSGi configuration like in the following example:</p>
 
-<div>
-<div>
-<pre class="source">org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService
+<div class="source">
+<div class="source"><pre class="prettyprint">org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService
     persistentCache=&quot;cache,size\=2048&quot;
     journalCache=&quot;diff-cache,size\=1024&quot;
 </pre></div></div>
-
 <p>The configuration options are the same as for the <tt>persistentCache</tt>, but options unrelated to the diff cache type are ignored. The default configuration is <tt>journalCache=&quot;diff-cache&quot;</tt> and can be disabled the same way as the regular persistent cache with a dash: <tt>journalCache=&quot;-&quot;</tt>.</p></div>
 <div class="section">
 <h3><a name="aDependencies"></a>&#xa0;Dependencies</h3>
 <p>Internally, the persistent cache uses a key-value store (basically a java.util.Map), which is persisted to disk. The current key-value store backend is the <a class="externalLink" href="http://www.h2database.com/html/mvstore.html">H2 MVStore</a>. This library is only needed if the persistent cache is configured. Version 1.4.185 or newer is needed.</p>
 
-<div>
-<div>
-<pre class="source">&lt;dependency&gt;
+<div class="source">
+<div class="source"><pre class="prettyprint">&lt;dependency&gt;
     &lt;groupId&gt;com.h2database&lt;/groupId&gt;
     &lt;artifactId&gt;h2-mvstore&lt;/artifactId&gt;
     &lt;version&gt;1.4.185&lt;/version&gt;

Modified: jackrabbit/site/live/oak/docs/nodestore/segment/changes.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/segment/changes.html?rev=1838538&r1=1838537&r2=1838538&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/segment/changes.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/segment/changes.html Tue Aug 21 10:31:37 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-10 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-02-21 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180810" />
+    <meta name="Date-Revision-yyyymmdd" content="20180221" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Changes in the data format</title>
     <link rel="stylesheet" href="../../css/apache-maven-fluido-1.6.min.css" />
@@ -52,7 +52,6 @@
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Main APIs <b class="caret"></b></a>
         <ul class="dropdown-menu">
             <li><a href="http://www.day.com/specs/jcr/2.0/index.html" title="JCR API">JCR API</a></li>
-            <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" title="Jackrabbit API">Jackrabbit API</a></li>
             <li><a href="../../oak_api/overview.html" title="Oak API">Oak API</a></li>
         </ul>
       </li>
@@ -137,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-08-10<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-02-21<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -156,14 +155,12 @@
     <li><a href="../../architecture/nodestate.html" title="The Node State Model"><span class="none"></span>The Node State Model</a>  </li>
           <li class="nav-header">Main APIs</li>
     <li><a href="http://www.day.com/specs/jcr/2.0/index.html" class="externalLink" title="JCR API"><span class="none"></span>JCR API</a>  </li>
-    <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" class="externalLink" title="Jackrabbit API"><span class="none"></span>Jackrabbit API</a>  </li>
     <li><a href="../../oak_api/overview.html" title="Oak API"><span class="none"></span>Oak API</a>  </li>
           <li class="nav-header">Features and Plugins</li>
     <li><a href="../../nodestore/overview.html" title="Node Storage"><span class="icon-chevron-down"></span>Node Storage</a>
       <ul class="nav nav-list">
     <li><a href="../../nodestore/documentmk.html" title="Document NodeStore"><span class="icon-chevron-down"></span>Document NodeStore</a>
       <ul class="nav nav-list">
-    <li><a href="../../nodestore/document/mongo-document-store.html" title="MongoDB DocumentStore"><span class="none"></span>MongoDB DocumentStore</a>  </li>
     <li><a href="../../nodestore/document/node-bundling.html" title="Node Bundling"><span class="none"></span>Node Bundling</a>  </li>
     <li><a href="../../nodestore/document/secondary-store.html" title="Secondary Store"><span class="none"></span>Secondary Store</a>  </li>
     <li><a href="../../nodestore/persistent-cache.html" title="Persistent Cache"><span class="none"></span>Persistent Cache</a>  </li>
@@ -242,14 +239,15 @@
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--->
-<h1>Changes in the data format</h1>
+--><h1>Changes in the data format</h1>
 <p>This document describes the changes in the storage format introduced by the Oak Segment Tar module. The purpose of this document is not only to enumerate such changes, but also to explain the rationale behind them. Pointers to Jira issues are provided for a much more terse description of changes. Changes are presented in chronological order.</p>
 <div class="section">
 <h2><a name="Generation_in_segment_headers"></a>Generation in segment headers</h2>
-<ul>
 
+<ul>
+  
 <li>Jira issue: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3348">OAK-3348</a></li>
+  
 <li>Since: Oak Segment Tar 0.0.2</li>
 </ul>
 <p>The GC algorithm implemented by Oak Segment Tar is based on the fundamental idea of grouping records into generations. When GC is performed, records belonging to older generations can be removed, while records belonging to newer generations have to be retained.</p>
@@ -257,20 +255,24 @@
 <p>The original specification of the data format for the segment header left some space for future extensions. In the new format the generation is saved at offsets 10 to 13 as a 4-byte integer value.</p></div>
 <div class="section">
 <h2><a name="Stable_identifiers"></a>Stable identifiers</h2>
-<ul>
 
+<ul>
+  
 <li>Jira issue: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3348">OAK-3348</a></li>
+  
 <li>Since: Oak Segment Tar 0.0.2</li>
 </ul>
 <p>The fastest way to compare two node records is to compare their addresses. If their addresses are equal, the two node records are guaranteed to be equal. Transitively, given that records are immutable, the subtrees identified by those node records are guaranteed to be equal.</p>
 <p>The situation gets more complicated when the generation-based GC algorithm copies a node record over to a new generation to save it from being deleted. In this situation, two copies of the same node record live in two different generations, in two different segments and at two different addresses. To figure out whether such two node records are equal it is not sufficient to compare their addresses.</p>
-<p>To overcome this problem, a stable identifier has been added to every node record: when a new node record is serialized, the address it is serialized to becomes its stable identifier. The stable identifier is included in the node record and becomes part of its serialized format. When the node record is copied to a new generation and a new segment, its address will inevitably change. The stable identifier instead, being part of the node record itself, will not change. This enables fast comparison between different copies of the same node records by just comparing their stable identifiers.</p>
+<p>To overcome this problem, a stable identifier has been added to every node record: when a new node record is serialized, the address it is serialized to becomes its stable identifier. The stable identifier is included in the node record and becomes part of its serialized format. When the node record is copied to a new generation and a new segment, its address will inevitably change. The stable identifier instead, being part of the node record itself, will not change. This enables fast comparison between different copies of the same node records by just comparing their stable identifiers. </p>
 <p>The stable identifier is serialized as a 18-bytes-long string record. This record, in turn, is referenced from the node record by adding an additional 3-bytes-long reference field to it. In conclusion, stable identifiers add an overhead of 21 bytes to every node record in the worst case. In the best case, the 18-bytes-long string record is shared between node records when possible, so the aforementioned overhead represents an upper limit.</p></div>
 <div class="section">
 <h2><a name="Binary_references_index"></a>Binary references index</h2>
-<ul>
 
+<ul>
+  
 <li>Jira issue: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4201">OAK-4201</a></li>
+  
 <li>Since: Oak Segment Tar 0.0.4</li>
 </ul>
 <p>The original data format in Oak Segment mandates that every segment maintains a list of references to external binaries. Every time a record references an external binary - i.e. a piece of binary data that is stored in a Blob Store - a new binary reference is added to its segment. The list of references to external binaries is inspected periodically by the Blob Store GC algorithm to determine which binaries are currently in use. The Blob Store GC algorithm removes every binary that is not reported as used by the Segment Store.</p>
@@ -278,42 +280,50 @@
 <p>To make this process faster and and ease the pressure on I/O, Oak Segment Tar introduces an index of references to external binaries in every TAR file. This index aggregates the required information from every segment contained in a TAR file. When Blob Store GC is performed, instead of reading and parsing every segment, it can read and parse the index files. This optimization reduces the amount of I/O operations significantly.</p></div>
 <div class="section">
 <h2><a name="Simplified_segment_and_record_format"></a>Simplified segment and record format</h2>
-<ul>
 
+<ul>
+  
 <li>Jira issue: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4631">OAK-4631</a></li>
+  
 <li>Since: Oak Segment Tar 0.0.10</li>
 </ul>
 <p>The former data format limited the number of references to other segments a segment could have. This limitation caused sub-optimal segment space utilization when a record referencing data from many different segments was written. In this case records quickly exhausted the hard limit on the number of references to other segments, causing a premature flush of a non-full segment.</p>
 <p>Oak Segment Tar relaxed the limit on the number of segments to the point that it can now be considered irrelevant. This avoids the problem of non optimal segment space utilization. Tests show that with this change in place it is possible to store the same amount of data in a smaller amount of better utilized segments.</p>
-<p>The Jira issue referenced in this paragraph proposes other changes other than the one discussed here. Most of the changes proposed by the issue were subsequently reverted or never made in the code base because of their high toll on disk space. The comments on the issue and the referenced email thread provide a more detailed insight into the various trade-offs and considerations.</p></div>
+<p>The Jira issue referenced in this paragraph proposes other changes other than the one discussed here. Most of the changes proposed by the issue were subsequently reverted or never made in the code base because of their high toll on disk space. The comments on the issue and the referenced email thread provide a more detailed insight into the various trade-offs and considerations. </p></div>
 <div class="section">
 <h2><a name="Storage_format_versioning"></a>Storage format versioning</h2>
-<ul>
 
+<ul>
+  
 <li>Jira issue: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4295">OAK-4295</a></li>
+  
 <li>Since: Oak Segment Tar 0.0.10</li>
 </ul>
 <p>To avoid the (old) Oak Segment and the (new) Oak Segment Tar to step on each other&#x2019;s toes, an improved versioning mechanism of the data format was introduced.</p>
-<p>First of all, the version field in the segment header has been incremented from 11 in Oak Segment to 12 in Oak Segment Tar. This prevents Oak Segment Tar from accessing segments written by older implementations and Oak Segment accessing segments written by newer implementations.</p>
+<p>First of all, the version field in the segment header has been incremented from 11 in Oak Segment to 12 in Oak Segment Tar. This prevents Oak Segment Tar from accessing segments written by older implementations and Oak Segment accessing segments written by newer implementations. </p>
 <p>This strategy has been further improved by adding a manifest file in every data folder created by Oak Segment Tar. The manifest file is supposed to be a source of metadata for the whole repository. Oak Segment Tar checks for the presence of a manifest file very time a data folder is open. If a manifest file is there, the metadata has to be compatible with the current version of the currently executing code.</p>
 <p>Repositories written by Oak Segment do not generate a manifest file while those written by Oak Segment Tar do. This difference enables a fail-fast approach: when Oak Segment opens a data folder containing a manifest, it immediately fails complaining that the data format is too new. When Oak Segment Tar opens a non-empty data folder without a manifest, it immediately fails complaining that the data format is too old.</p></div>
 <div class="section">
 <h2><a name="Logic_record_IDs"></a>Logic record IDs</h2>
-<ul>
 
+<ul>
+  
 <li>Jira issue: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4659">OAK-4659</a></li>
+  
 <li>Since: Oak Segment Tar 0.0.14</li>
 </ul>
 <p>In the previous implementation (Oak Segment) the position of a record in its segment is fixed. Once written, its address consists of the identifier of its segment followed by its offset within the segment. The offset is the effective position of the record in the segment.</p>
-<p>This way of addressing records implies that a record can&#x2019;t be moved within a segment without changing its address. Moving a record means changing its segment, its position or both and results in all reference to it being broken.</p>
+<p>This way of addressing records implies that a record can&#x2019;t be moved within a segment without changing its address. Moving a record means changing its segment, its position or both and results in all reference to it being broken. </p>
 <p>To gain more flexibility for storing records, a new level of indirection was introduced replacing offsets with logic identifiers. Instead of referencing a record by a segment identifier and its offset in the segment, a segment identifier and a record number is used. The record number is a logic address for a record in the segment and is local to the segment.</p>
 <p>With this solution the record can be moved within the segment without breaking references to it. This change enables a number of different algorithms when it comes to garbage collection. For example, some records can now be removed from a segment and the segment can be shrunk down by moving every remaining record next to each other. This operation would change the position of the remaining record in the segment, but not their logic record identifier.</p>
 <p>This change introduced a new translation table in the segment header to map record numbers to record offsets. The table occupies 9 bytes per record (4 bytes for the record number, 1 byte for the record type and 4 bytes for the record offset). Moreover, a new 4-bytes-long integer field has been added to the segment header containing the number of entries of the translation table.</p></div>
 <div class="section">
 <h2><a name="Root_record_types"></a>Root record types</h2>
-<ul>
 
+<ul>
+  
 <li>Jira issue: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2498">OAK-2498</a></li>
+  
 <li>Since: Oak Segment Tar 0.0.16</li>
 </ul>
 <p>The record number translation table mentioned in the previous paragraph contains a 1-byte field for every record. This field determines the type of the record referenced by that row of the table. The change in this paragraph is about improving the information stored in the type field of the record number translation table.</p>

Modified: jackrabbit/site/live/oak/docs/nodestore/segment/classes.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/segment/classes.html?rev=1838538&r1=1838537&r2=1838538&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/segment/classes.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/segment/classes.html Tue Aug 21 10:31:37 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-10 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-03-27 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180810" />
+    <meta name="Date-Revision-yyyymmdd" content="20180327" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Design of Oak Segment Tar</title>
     <link rel="stylesheet" href="../../css/apache-maven-fluido-1.6.min.css" />
@@ -52,7 +52,6 @@
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Main APIs <b class="caret"></b></a>
         <ul class="dropdown-menu">
             <li><a href="http://www.day.com/specs/jcr/2.0/index.html" title="JCR API">JCR API</a></li>
-            <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" title="Jackrabbit API">Jackrabbit API</a></li>
             <li><a href="../../oak_api/overview.html" title="Oak API">Oak API</a></li>
         </ul>
       </li>
@@ -137,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-08-10<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-03-27<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -156,14 +155,12 @@
     <li><a href="../../architecture/nodestate.html" title="The Node State Model"><span class="none"></span>The Node State Model</a>  </li>
           <li class="nav-header">Main APIs</li>
     <li><a href="http://www.day.com/specs/jcr/2.0/index.html" class="externalLink" title="JCR API"><span class="none"></span>JCR API</a>  </li>
-    <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" class="externalLink" title="Jackrabbit API"><span class="none"></span>Jackrabbit API</a>  </li>
     <li><a href="../../oak_api/overview.html" title="Oak API"><span class="none"></span>Oak API</a>  </li>
           <li class="nav-header">Features and Plugins</li>
     <li><a href="../../nodestore/overview.html" title="Node Storage"><span class="icon-chevron-down"></span>Node Storage</a>
       <ul class="nav nav-list">
     <li><a href="../../nodestore/documentmk.html" title="Document NodeStore"><span class="icon-chevron-down"></span>Document NodeStore</a>
       <ul class="nav nav-list">
-    <li><a href="../../nodestore/document/mongo-document-store.html" title="MongoDB DocumentStore"><span class="none"></span>MongoDB DocumentStore</a>  </li>
     <li><a href="../../nodestore/document/node-bundling.html" title="Node Bundling"><span class="none"></span>Node Bundling</a>  </li>
     <li><a href="../../nodestore/document/secondary-store.html" title="Secondary Store"><span class="none"></span>Secondary Store</a>  </li>
     <li><a href="../../nodestore/persistent-cache.html" title="Persistent Cache"><span class="none"></span>Persistent Cache</a>  </li>
@@ -242,16 +239,15 @@
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--->
-<h1>Design of Oak Segment Tar</h1>
-<p>This section gives a high level overview of the design of Oak Segment Tar, its most important classes, their purpose and relationship. More in depth information is available from the Javadoc of the individual classes.</p>
+--><h1>Design of Oak Segment Tar</h1>
+<p>This section gives a high level overview of the design of Oak Segment Tar, its most important classes, their purpose and relationship. More in depth information is available from the Javadoc of the individual classes. </p>
 <div class="section">
 <h2><a name="Overview"></a>Overview</h2>
 <p><img src="classes.png" alt="Class diagram" /></p>
-<p>The <tt>SegmentNodeStore</tt> is Oak Segment Tar&#x2019;s implementation of the <a href="../overview.html">NodeStore API</a>. It uses a <tt>Revisions</tt> instance for accessing and setting the current head state, a <tt>SegmentReader</tt> for reading records from segments, a <tt>SegmentWriter</tt> for writing records to segments and a <tt>BlobStore</tt> for reading and writing binaries.</p>
-<p>The <tt>SegmentStore</tt> serves as a persistence backend for the <tt>SegmentNodeStore</tt>. It is responsible for providing concrete implementations of <tt>Revisions</tt>, <tt>SegmentReader</tt> and <tt>BlobStore</tt> to the former.</p>
+<p>The <tt>SegmentNodeStore</tt> is Oak Segment Tar&#x2019;s implementation of the <a href="../overview.html">NodeStore API</a>. It uses a <tt>Revisions</tt> instance for accessing and setting the current head state, a <tt>SegmentReader</tt> for reading records from segments, a <tt>SegmentWriter</tt> for writing records to segments and a <tt>BlobStore</tt> for reading and writing binaries. </p>
+<p>The <tt>SegmentStore</tt> serves as a persistence backend for the <tt>SegmentNodeStore</tt>. It is responsible for providing concrete implementations of <tt>Revisions</tt>, <tt>SegmentReader</tt> and <tt>BlobStore</tt> to the former. </p>
 <p>The <tt>FileStore</tt> is the implementation the <tt>SegmentStore</tt> that persists segments in tar files. The <tt>MemoryStore</tt> (not shown above) is an alternative implementation, which stores the segments in memory only. It is used for testing.</p>
-<p>The <tt>FileStore</tt> depends on <tt>TarFiles</tt> for the management of the TAR files on the file system. <tt>TarFiles</tt> is an aggregation of one <tt>TarWriter</tt> and zero or more <tt>TarReader</tt>. This design represents the foundation of the append-only store implemented by the <tt>FileStore</tt>, where data is appended to one <tt>TarWriter</tt> and archived in many <tt>TarReader</tt> over time.</p></div>
+<p>The <tt>FileStore</tt> depends on <tt>TarFiles</tt> for the management of the TAR files on the file system. <tt>TarFiles</tt> is an aggregation of one <tt>TarWriter</tt> and zero or more <tt>TarReader</tt>. This design represents the foundation of the append-only store implemented by the <tt>FileStore</tt>, where data is appended to one <tt>TarWriter</tt> and archived in many <tt>TarReader</tt> over time. </p></div>
         </div>
       </div>
     </div>

Modified: jackrabbit/site/live/oak/docs/nodestore/segment/overview.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/segment/overview.html?rev=1838538&r1=1838537&r2=1838538&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/segment/overview.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/segment/overview.html Tue Aug 21 10:31:37 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-10 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-21 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180810" />
+    <meta name="Date-Revision-yyyymmdd" content="20180821" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Oak Segment Tar</title>
     <link rel="stylesheet" href="../../css/apache-maven-fluido-1.6.min.css" />
@@ -137,7 +137,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-08-10<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-08-21<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>

Modified: jackrabbit/site/live/oak/docs/nodestore/segment/records.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/segment/records.html?rev=1838538&r1=1838537&r2=1838538&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/segment/records.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/segment/records.html Tue Aug 21 10:31:37 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-10 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-02-21 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180810" />
+    <meta name="Date-Revision-yyyymmdd" content="20180221" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Segments and records</title>
     <link rel="stylesheet" href="../../css/apache-maven-fluido-1.6.min.css" />
@@ -52,7 +52,6 @@
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Main APIs <b class="caret"></b></a>
         <ul class="dropdown-menu">
             <li><a href="http://www.day.com/specs/jcr/2.0/index.html" title="JCR API">JCR API</a></li>
-            <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" title="Jackrabbit API">Jackrabbit API</a></li>
             <li><a href="../../oak_api/overview.html" title="Oak API">Oak API</a></li>
         </ul>
       </li>
@@ -137,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-08-10<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-02-21<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -156,14 +155,12 @@
     <li><a href="../../architecture/nodestate.html" title="The Node State Model"><span class="none"></span>The Node State Model</a>  </li>
           <li class="nav-header">Main APIs</li>
     <li><a href="http://www.day.com/specs/jcr/2.0/index.html" class="externalLink" title="JCR API"><span class="none"></span>JCR API</a>  </li>
-    <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" class="externalLink" title="Jackrabbit API"><span class="none"></span>Jackrabbit API</a>  </li>
     <li><a href="../../oak_api/overview.html" title="Oak API"><span class="none"></span>Oak API</a>  </li>
           <li class="nav-header">Features and Plugins</li>
     <li><a href="../../nodestore/overview.html" title="Node Storage"><span class="icon-chevron-down"></span>Node Storage</a>
       <ul class="nav nav-list">
     <li><a href="../../nodestore/documentmk.html" title="Document NodeStore"><span class="icon-chevron-down"></span>Document NodeStore</a>
       <ul class="nav nav-list">
-    <li><a href="../../nodestore/document/mongo-document-store.html" title="MongoDB DocumentStore"><span class="none"></span>MongoDB DocumentStore</a>  </li>
     <li><a href="../../nodestore/document/node-bundling.html" title="Node Bundling"><span class="none"></span>Node Bundling</a>  </li>
     <li><a href="../../nodestore/document/secondary-store.html" title="Secondary Store"><span class="none"></span>Secondary Store</a>  </li>
     <li><a href="../../nodestore/persistent-cache.html" title="Persistent Cache"><span class="none"></span>Persistent Cache</a>  </li>
@@ -242,16 +239,17 @@
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--->
-<h1>Segments and records</h1>
+--><h1>Segments and records</h1>
 <p>While <a href="tar.html">TAR files</a> and segments are a coarse-grained mechanism to divide the repository content in more manageable pieces, the real information is stored inside the segments as finer-grained records. This page details the structure of segments and show the binary representation of data stored by Oak.</p>
 <div class="section">
 <h2><a name="Segments"></a>Segments</h2>
 <p>Segments are not created equal. Oak, in fact, distinguishes data and bulk segments, where the former is used to store structured data (e.g. information about node and properties), while the latter contains unstructured data (e.g. the value of binary properties or of very long strings).</p>
 <p>It is possible to tell apart a bulk segment from a data segment by just looking at its identifier. A segment identifier is a randomly generated UUID. Segment identifiers are 16 bytes long, but Oak uses 4 bits to set apart bulk segments from data segments. The following bit patterns are used (each <tt>x</tt> represents four random bits):</p>
-<ul>
 
+<ul>
+  
 <li><tt>xxxxxxxx-xxxx-4xxx-axxx-xxxxxxxxxxxx</tt> data segment UUID</li>
+  
 <li><tt>xxxxxxxx-xxxx-4xxx-bxxx-xxxxxxxxxxxx</tt> bulk segment UUID</li>
 </ul>
 <p>(This encoding makes segment UUIDs appear as syntactically valid version 4 random UUIDs specified in RFC 4122.)</p></div>
@@ -259,11 +257,9 @@
 <h2><a name="Bulk_segments"></a>Bulk segments</h2>
 <p>Bulk segments contain raw binary data, interpreted simply as a sequence of block records with no headers or other extra metadata:</p>
 
-<div>
-<div>
-<pre class="source">[block 1] [block 2] ... [block N]
+<div class="source">
+<div class="source"><pre class="prettyprint">[block 1] [block 2] ... [block N]
 </pre></div></div>
-
 <p>A bulk segment whose length is <tt>n</tt> bytes consists of <tt>n div 4096</tt> block records of 4KiB each followed possibly a block record of <tt>n mod 4096</tt> bytes, if there still are remaining bytes in the segment. The structure of a bulk segment can thus be determined based only on the segment length.</p></div>
 <div class="section">
 <h2><a name="Data_segments"></a>Data segments</h2>
@@ -273,17 +269,14 @@
 <p>The segment header also maintains a set of references to <i>root records</i>: those records that are not referenced from any other records in the segment.</p>
 <p>The overall structure of a data segment is:</p>
 
-<div>
-<div>
-<pre class="source">[segment header] [record 1] [record 2] ... [record N]
+<div class="source">
+<div class="source"><pre class="prettyprint">[segment header] [record 1] [record 2] ... [record N]
 </pre></div></div>
-
 <p>The segment header and each record is zero-padded to make their size a multiple of four bytes and to align the next record at a four-byte boundary.</p>
 <p>The segment header consists of the following fields. All integers are stored in big endian format.</p>
 
-<div>
-<div>
-<pre class="source">+---------+---------+---------+---------+---------+---------+---------+---------+
+<div class="source">
+<div class="source"><pre class="prettyprint">+---------+---------+---------+---------+---------+---------+---------+---------+
 | magic bytes: &quot;0aK&quot;          | version | reserved                               
 +---------+---------+---------+---------+---------+---------+---------+---------+
   reserved          | generation                            | segrefcount            
@@ -303,7 +296,6 @@
 |                                                 | padding (set to 0)          |
 +---------+---------+---------+---------+---------+---------+---------+---------+
 </pre></div></div>
-
 <p>The first three bytes of a segment always contain the ASCII string &#x201c;0aK&#x201d;, which is intended to make the binary segment data format easily detectable. The next byte indicates the version of the segment format and is currently set to 12.</p>
 <p>The <tt>generation</tt> field indicates the segment&#x2019;s generation wrt. to garbage collection. This field is used by the garbage collector to determine whether a segment needs to be retained or can be collected.</p>
 <p>The <tt>segrefcount</tt> field indicates how many other segments are referenced by records within this segment. The identifiers of those segments are listed starting at offset 32 of the segment header. This lookup table is used to optimize garbage collection and to avoid having to repeat the 16-byte UUIDs whenever references to records in other segments are made.</p>
@@ -317,11 +309,9 @@
 <p>The record number field is a logical identifier for the record. The logical identifier is used as a lookup key in the record references table in the segment identified by the segment field. Once the correct row in the record references table is found, the record offset can be used to locate the position of the record in the segment.</p>
 <p>The offset is relative to the beginning of a theoretical segment which is defined to be 256 KiB. Since records are added from the bottom of a segment to the top (i.e. from higher to lower offsets), and since segments could be shrunk down to be smaller than 256 KiB, the offset has to be normalized with to the following formula.</p>
 
-<div>
-<div>
-<pre class="source">SIZE - 256 KiB + OFFSET
+<div class="source">
+<div class="source"><pre class="prettyprint">SIZE - 256 KiB + OFFSET
 </pre></div></div>
-
 <p><tt>SIZE</tt> is the actual size of the segment under inspection, and <tt>OFFSET</tt> is the offset looked up from the record references table. The normalized offset can be used to locate the position of the record in the current segment.</p></div>
 <div class="section">
 <h2><a name="Records"></a>Records</h2>
@@ -339,44 +329,54 @@
 <p>Value records are used for storing names and values of the content tree. Since item names can be thought of as name values and since all JCR and Oak values can be expressed in binary form (strings encoded in UTF-8), it is easiest to simply use that form for storing all values. The size overhead of such a form for small value types like booleans or dates is amortized by the facts that those types are used only for a minority of values in typical content trees and that repeating copies of a value can be stored just once.</p>
 <p>There are four types of value records: small, medium, long and external. The small- and medium-sized values are stored in inline form, prepended by one or two bytes that indicate the length of the value. Long values of up to two exabytes (2^61) are stored as a list of block records. Finally an external value record contains the length of the value and a string reference (up to 4kB in length) to some external storage location.</p>
 <p>The type of a value record is encoded in the high-order bits of the first byte of the record. These bit patterns are:</p>
-<ul>
 
+<ul>
+  
 <li><tt>0xxxxxxx</tt>: small value, length (0 - 127 bytes) encoded in 7 bits</li>
+  
 <li><tt>10xxxxxx</tt>: medium value length (128 - 16511 bytes) encoded in 6 + 8 bits</li>
+  
 <li><tt>110xxxxx</tt>: long value, length (up to 2^61 bytes) encoded in 5 + 7*8 bits</li>
+  
 <li><tt>1110xxxx</tt>: external value, reference string length encoded in 4 + 8 bits</li>
 </ul></div>
 <div class="section">
 <h3><a name="List_records"></a>List records</h3>
 <p>List records represent a general-purpose list of record identifiers. They are used as building blocks for other types of records, as we saw for value records and as we will see for template records and node records.</p>
 <p>The list record is a logical record using two different types of physical records to represent itself:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>bucket record: this is a recursive record representing a list of at most 255 references. A bucket record can reference other bucket records, hierarchically, or the record identifiers of the elements to be stored in the list. A bucket record doesn&#x2019;t maintain any other information exception record identifiers.</p>
-</li>
+<p>bucket record: this is a recursive record representing a list of at most 255  references. A bucket record can reference other bucket records,  hierarchically, or the record identifiers of the elements to be stored in the  list. A bucket record doesn&#x2019;t maintain any other information exception record  identifiers.</p></li>
+  
 <li>
-
-<p>list record: this is a top-level record that maintains the size of the list in an integer field and a record identifier pointing to a bucket.</p>
-<p>+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2013;+ | sub-list ID 1            | &#x2026; | +&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2013;+ | v +&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+ | record ID 1              | &#x2026; | record ID 255            | +&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+&#x2014;&#x2014;&#x2013;+</p>
-</li>
+<p>list record: this is a top-level record that maintains the size of the list in  an integer field and a record identifier pointing to a bucket.</p></li>
 </ul>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">+--------+--------+--------+-----+
+| sub-list ID 1            | ... |
++--------+--------+--------+-----+
+  |
+  v
++--------+--------+--------+-----+--------+--------+--------+
+| record ID 1              | ... | record ID 255            |
++--------+--------+--------+-----+--------+--------+--------+
+</pre></div></div>
 <p>The result is a hierarchically stored immutable list where each element can be accessed in O(log N) time and the size overhead of updating or appending list elements (and thus creating a new immutable list) is also O(log N).</p>
-<p>List records are useful to store a list of references to other records. If the list is too big, it is split into different bucket records that may be  stored in the same segment or across segments. This guarantees good performance for small lists, without loosing the capability to store lists with a big number of elements.</p></div>
+<p>List records are useful to store a list of references to other records. If the list is too big, it is split into different bucket records that may be stored in the same segment or across segments. This guarantees good performance for small lists, without loosing the capability to store lists with a big number of elements.</p></div>
 <div class="section">
 <h3><a name="Map_records"></a>Map records</h3>
 <p>Map records implement a general-purpose unordered map of strings to record identifiers. They are used for nodes with a large number of properties or child nodes. As lists they are represented using two types of physical record:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>leaf record: if the number of elements in the map is small, they are all stored in a leaf record. This covers the simplest case for small maps.</p>
-</li>
+<p>leaf record: if the number of elements in the map is small, they are all  stored in a leaf record. This covers the simplest case for small maps.</p></li>
+  
 <li>
-
-<p>branch record: if the number of elements in the map is too big, the original map is split into smaller maps based on a hash function applied to the keys of the map. A branch record is recursive, because it can reference other branch records if the sub-maps are too big and need to be split again.</p>
-</li>
+<p>branch record: if the number of elements in the map is too big, the original  map is split into smaller maps based on a hash function applied to the keys of  the map. A branch record is recursive, because it can reference other branch  records if the sub-maps are too big and need to be split again.</p></li>
 </ul>
 <p>Maps are stored using the hash array mapped trie (HAMT) data structure. The hash code of each key is split into pieces of 5 bits each and the keys are sorted into 32 (2^5) buckets based on the first 5 bits. If a bucket contains less than 32 entries, then it is stored directly as a list of key-value pairs. Otherwise the keys are split into sub-buckets based on the next 5 bits of their hash codes. When all buckets are stored, the list of top-level bucket references gets stored along with the total number of entries in the map.</p>
 <p>The result is a hierarchically stored immutable map where each element can be accessed in O(log N) time and the size overhead of updating or inserting list elements is also O(log N).</p>
@@ -388,21 +388,17 @@
 <p>The template record allows Oak to handle simple modifications to nodes in the most efficient way possible.</p>
 <p>As such a template record describes the common structure of a family of related nodes. Since the structures of most nodes in a typical content tree fall into a small set of common templates, it makes sense to store such templates separately instead of repeating that information separately for each node. For example, the property names and types as well as child node names of all nt:file nodes are typically the same. The presence of mixins and different subtypes increases the number of different templates, but they&#x2019;re typically still far fewer than nodes in the repository.</p>
 <p>A template record consists of a set of up to N (exact size TBD, N ~ 256) property name and type pairs. Additionally, since nodes that are empty or contain just a single child node are most common, a template record also contains information whether the node has zero, one or many child nodes. In case of a single child node, the template also contains the name of that node. For example, the template for typical mix:versionable nt:file nodes would be (using CND-like notation):</p>
-<ul>
-
-<li>jcr:primaryType (NAME)
-<ul>
 
-<li>jcr:mixinTypes (NAME) multiple</li>
-<li>jcr:created (DATE)</li>
-<li>jcr:uuid (STRING)</li>
-<li>jcr:versionHistory (REFERENCE)</li>
-<li>jcr:predecessors (REFERENCE) multiple</li>
-<li>jcr:baseVersion (REFERENCE)</li>
-<li>jcr:content</li>
-</ul>
-</li>
-</ul>
+<div class="source">
+<div class="source"><pre class="prettyprint">- jcr:primaryType (NAME)
+- jcr:mixinTypes (NAME) multiple
+- jcr:created (DATE)
+- jcr:uuid (STRING)
+- jcr:versionHistory (REFERENCE)
+- jcr:predecessors (REFERENCE) multiple
+- jcr:baseVersion (REFERENCE)
++ jcr:content
+</pre></div></div>
 <p>The names used in a template are stored as separate value records and included by reference. This way multiple templates that for example all contain the &#x201c;jcr:primaryType&#x201d; property name don&#x2019;t need to repeatedly store it.</p></div>
 <div class="section">
 <h3><a name="Node_records"></a>Node records</h3>

Modified: jackrabbit/site/live/oak/docs/nodestore/segment/tar.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/segment/tar.html?rev=1838538&r1=1838537&r2=1838538&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/segment/tar.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/segment/tar.html Tue Aug 21 10:31:37 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-08-10 
+ | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-02-21 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180810" />
+    <meta name="Date-Revision-yyyymmdd" content="20180221" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Structure of TAR files</title>
     <link rel="stylesheet" href="../../css/apache-maven-fluido-1.6.min.css" />
@@ -52,7 +52,6 @@
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Main APIs <b class="caret"></b></a>
         <ul class="dropdown-menu">
             <li><a href="http://www.day.com/specs/jcr/2.0/index.html" title="JCR API">JCR API</a></li>
-            <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" title="Jackrabbit API">Jackrabbit API</a></li>
             <li><a href="../../oak_api/overview.html" title="Oak API">Oak API</a></li>
         </ul>
       </li>
@@ -137,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-08-10<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-02-21<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -156,14 +155,12 @@
     <li><a href="../../architecture/nodestate.html" title="The Node State Model"><span class="none"></span>The Node State Model</a>  </li>
           <li class="nav-header">Main APIs</li>
     <li><a href="http://www.day.com/specs/jcr/2.0/index.html" class="externalLink" title="JCR API"><span class="none"></span>JCR API</a>  </li>
-    <li><a href="https://jackrabbit.apache.org/jcr/jcr-api.html" class="externalLink" title="Jackrabbit API"><span class="none"></span>Jackrabbit API</a>  </li>
     <li><a href="../../oak_api/overview.html" title="Oak API"><span class="none"></span>Oak API</a>  </li>
           <li class="nav-header">Features and Plugins</li>
     <li><a href="../../nodestore/overview.html" title="Node Storage"><span class="icon-chevron-down"></span>Node Storage</a>
       <ul class="nav nav-list">
     <li><a href="../../nodestore/documentmk.html" title="Document NodeStore"><span class="icon-chevron-down"></span>Document NodeStore</a>
       <ul class="nav nav-list">
-    <li><a href="../../nodestore/document/mongo-document-store.html" title="MongoDB DocumentStore"><span class="none"></span>MongoDB DocumentStore</a>  </li>
     <li><a href="../../nodestore/document/node-bundling.html" title="Node Bundling"><span class="none"></span>Node Bundling</a>  </li>
     <li><a href="../../nodestore/document/secondary-store.html" title="Secondary Store"><span class="none"></span>Secondary Store</a>  </li>
     <li><a href="../../nodestore/persistent-cache.html" title="Persistent Cache"><span class="none"></span>Persistent Cache</a>  </li>
@@ -242,8 +239,7 @@
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--->
-<h1>Structure of TAR files</h1>
+--><h1>Structure of TAR files</h1>
 <p>Here is described the physical layout of a TAR file as used by Apache Oak. First, a brief introduction of the TAR format is given. Next, more details are provided about the low level information that is written in TAR entries. Finally, it&#x2019;s described how Oak saves a graph data structure inside the TAR file and how this representation is optimized for fast retrieval.</p>
 <div class="section">
 <h2><a name="Organization_of_a_TAR_file"></a>Organization of a TAR file</h2>
@@ -251,82 +247,66 @@
 <p>Logically speaking, a TAR file is a linear sequence of entries. Every entry is represented by two or more blocks. The first block always contains the entry header. Subsequent blocks store the content of the file.</p>
 <p><img src="tar.png" alt="Overview of a TAR file" /></p>
 <p>The entry header is composed of the following fields:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>file name (100 bytes) - name of the file stored in this entry.</p>
-</li>
+<p>file name (100 bytes) - name of the file stored in this entry.</p></li>
+  
 <li>
-
-<p>file mode (8 bytes) - string representation of the octal file mode.</p>
-</li>
+<p>file mode (8 bytes) - string representation of the octal file mode.</p></li>
+  
 <li>
-
-<p>owner&#x2019;s numeric ID (8 bytes) - string representation of the user ID of the owner of the file.</p>
-</li>
+<p>owner&#x2019;s numeric ID (8 bytes) - string representation of the user ID of the  owner of the file.</p></li>
+  
 <li>
-
-<p>group&#x2019;s numeric ID (8 bytes) - string representation of the group ID of the owner of the file.</p>
-</li>
+<p>group&#x2019;s numeric ID (8 bytes) - string representation of the group ID of the  owner of the file.</p></li>
+  
 <li>
-
-<p>file size (12 bytes) - string representation of the octal size of the file.</p>
-</li>
+<p>file size (12 bytes) - string representation of the octal size of the file.</p></li>
+  
 <li>
-
-<p>last modification time (12 bytes) - string representation of the octal time stamp when the file was last modified.</p>
-</li>
+<p>last modification time (12 bytes) - string representation of the octal time  stamp when the file was last modified.</p></li>
+  
 <li>
-
-<p>checksum (8 bytes) - checksum for the header data.</p>
-</li>
+<p>checksum (8 bytes) - checksum for the header data.</p></li>
+  
 <li>
-
-<p>file type (1 byte) - type of the file stored in the entry. This field specifies if the file is a regular file, a hard link or a symbolic link.</p>
-</li>
+<p>file type (1 byte) - type of the file stored in the entry. This field  specifies if the file is a regular file, a hard link or a symbolic link.</p></li>
+  
 <li>
-
-<p>name of linked file (1 byte) - in case the file stored in the entry is a link, this field stores the name of the file pointed to by the link.</p>
-</li>
+<p>name of linked file (1 byte) - in case the file stored in the entry is a link,  this field stores the name of the file pointed to by the link.</p></li>
 </ul></div>
 <div class="section">
 <h2><a name="The_TAR_file_as_used_by_Oak"></a>The TAR file as used by Oak</h2>
 <p>Some fields are not used by Oak. In particular, Oak sets the file mode, the owner&#x2019;s numeric ID, the group&#x2019;s numeric ID, the checksum, and the name of linked file to uninteresting values. The only meaningful values assigned to the fields of the entry header are:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>file name: the name of the data file. There are different data files used by Oak. They are described below.</p>
-</li>
+<p>file name: the name of the data file. There are different data files used by  Oak. They are described below.</p></li>
+  
 <li>
-
-<p>file size: the size of the data file. The value assigned to this field is trivially computed from the amount of information stored in the data file.</p>
-</li>
+<p>file size: the size of the data file. The value assigned to this field is  trivially computed from the amount of information stored in the data file.</p></li>
+  
 <li>
-
-<p>last modification time: the time stamp when the entry was written.</p>
-</li>
+<p>last modification time: the time stamp when the entry was written.</p></li>
 </ul>
 <p>There are four kinds of files stored in a TAR file:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>segments: this type of file contains data about a segment in the segment store. This kind of file has a file name in the form <tt>UUID.CRC2</tt>, where <tt>UUID</tt> is a 128 bit UUID represented as an hexadecimal string and <tt>CRC2</tt> is a zero- padded numeric string representing the CRC2 checksum of the raw segment data.</p>
-</li>
+<p>segments: this type of file contains data about a segment in the segment  store. This kind of file has a file name in the form <tt>UUID.CRC2</tt>, where <tt>UUID</tt>  is a 128 bit UUID represented as an hexadecimal string and <tt>CRC2</tt> is a zero-  padded numeric string representing the CRC2 checksum of the raw segment data.</p></li>
+  
 <li>
-
-<p>binary references: this file has a name ending in <tt>.brf</tt> and represents a catalog of blobs (i.e. value records) referenced by segments in this TAR file. This catalog is indexed by the generation of the segments it contains.</p>
-</li>
+<p>binary references: this file has a name ending in <tt>.brf</tt> and represents a  catalog of blobs (i.e. value records) referenced by segments in this TAR file.  This catalog is indexed by the generation of the segments it contains.</p></li>
+  
 <li>
-
-<p>graph: this file has a name ending in <tt>.gph</tt> and contains the segment graph of all the segments in this tar file. The graph is represented as an adjacency list of UUIDs.</p>
-</li>
+<p>graph: this file has a name ending in <tt>.gph</tt> and contains the segment graph  of all the segments in this tar file. The graph is represented as an adjacency  list of UUIDs.</p></li>
+  
 <li>
-
-<p>index: this file has a name ending in <tt>.idx</tt> and contains a sorted list of every segment contained in the TAR file.</p>
-</li>
+<p>index: this file has a name ending in <tt>.idx</tt> and contains a sorted list of  every segment contained in the TAR file.</p></li>
 </ul></div>
 <div class="section">
 <h2><a name="Oak_TAR_file_layout"></a>Oak TAR file layout</h2>
@@ -343,66 +323,67 @@
 <p>The list of segments referenced by a data segment will end up in the graph file. To speed up the process of locating a segment in the list of referenced segment, this list is maintained ordered.</p>
 <p>The data segment file is divided in two parts. The first is the header and the second contains the actual records contained in this segment.</p>
 <p>The data segment header is divided in three parts:</p>
-<ul>
 
+<ul>
+  
 <li>
-
 <p>a fixed part (32 bytes) containing:</p>
+  
 <ul>
-
-<li>
-
-<p>a magic number (3 bytes): identifies the beginning of a data segment.</p>
-</li>
-<li>
-
-<p>version (1 byte): the segment version.</p>
-</li>
-<li>
-
-<p>empty bytes (6 bytes): reserved for future use.</p>
-</li>
-<li>
-
-<p>generation (4 bytes): generation of the segment, serialized as a big endian integer.</p>
-</li>
-<li>
-
-<p>number of references (4 bytes): number of references to external segments, serialized as a big endian integer.</p>
-</li>
-<li>
-
-<p>number of records (4 bytes): number of records in this segment, serialized as a big endian integer.</p>
-</li>
-<li>
-
-<p>empty bytes (10 bytes): reserved for future use.</p>
-</li>
-</ul>
-</li>
-<li>
-
-<p>second part of the header is a variable list of references to external segments. Here there will be a list of UUIDs - one per referenced segment - matching the number of references specified in the first part of the header.</p>
-</li>
-<li>
-
-<p>the third and last part of the header consists of a list of record header entries, matching the number of records specified in the first part of the header. Each record header consists of:</p>
+    
+<li>a magic number (3 bytes): identifies the beginning of a data segment.</li>
+  </ul>
+  
 <ul>
-
-<li>
-
-<p>record number (4 bytes), serialized as a big endian integer.</p>
-</li>
+    
+<li>version (1 byte): the segment version.</li>
+  </ul>
+  
+<ul>
+    
+<li>empty bytes (6 bytes): reserved for future use.</li>
+  </ul>
+  
+<ul>
+    
+<li>generation (4 bytes): generation of the segment, serialized as a big endian  integer.</li>
+  </ul>
+  
+<ul>
+    
+<li>number of references (4 bytes): number of references to external segments,  serialized as a big endian integer.</li>
+  </ul>
+  
+<ul>
+    
+<li>number of records (4 bytes): number of records in this segment, serialized  as a big endian integer.</li>
+  </ul>
+  
+<ul>
+    
+<li>empty bytes (10 bytes): reserved for future use.</li>
+  </ul></li>
+  
 <li>
-
-<p>record type (1 byte): can be one of <i>LEAF</i>, <i>BRANCH</i>, <i>BUCKET</i>, <i>LIST</i>, <i>VALUE</i>, <i>BLOCK</i>, <i>TEMPLATE</i>, <i>NODE</i> or <i>BLOB_ID</i>.</p>
-</li>
+<p>second part of the header is a variable list of references to external segments.  Here there will be a list of UUIDs - one per referenced segment - matching the  number of references specified in the first part of the header.</p></li>
+  
 <li>
-
-<p>record offset (4 bytes), serialized as a big endian integer: offset of the record counting from the end of the segment. The actual position of the record can be obtained by computing <tt>(segment size - offset)</tt>.</p>
-</li>
-</ul>
-</li>
+<p>the third and last part of the header consists of a list of record header  entries, matching the number of records specified in the first part of the  header. Each record header consists of:</p>
+  
+<ul>
+    
+<li>record number (4 bytes), serialized as a big endian integer.</li>
+  </ul>
+  
+<ul>
+    
+<li>record type (1 byte): can be one of <i>LEAF</i>, <i>BRANCH</i>, <i>BUCKET</i>, <i>LIST</i>,  <i>VALUE</i>, <i>BLOCK</i>, <i>TEMPLATE</i>, <i>NODE</i> or <i>BLOB_ID</i>.</li>
+  </ul>
+  
+<ul>
+    
+<li>record offset (4 bytes), serialized as a big endian integer: offset of the  record counting from the end of the segment. The actual position of the  record can be obtained by computing <tt>(segment size - offset)</tt>.</li>
+  </ul></li>
 </ul>
 <p>After the segment header, the actual records are stored, at the offsets advertised in the corresponding record header stored in the last part of the segment header.</p>
 <p>See <a href="records.html">Segments and records</a> for description of the various record types and their format.</p></div>
@@ -412,143 +393,120 @@
 <p>The format of the binary references file is optimized for reading. The file is stored in reverse order to maintain the most important information at the end of the file. This strategy is inline with the overall layout of the entries in the TAR file.</p>
 <p>The binary references file is divided in two parts. The first is a header and the second contains the real data in the catalog.</p>
 <p>The binary references header contains the following fields:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>a magic number (4 bytes): identifies the beginning of a binary references file.</p>
-</li>
+<p>a magic number (4 bytes): identifies the beginning of a binary references file.</p></li>
+  
 <li>
-
-<p>size of the whole binary references mapping (4 bytes): number of bytes occupied by the entire structure holding binary references (per generation, per segment).</p>
-</li>
+<p>size of the whole binary references mapping (4 bytes): number of bytes occupied  by the entire structure holding binary references (per generation, per segment).</p></li>
+  
 <li>
-
-<p>number of generations (4 bytes): number of different generations of the segments which refer blobs.</p>
-</li>
+<p>number of generations (4 bytes): number of different generations of the segments  which refer blobs.</p></li>
+  
 <li>
-
-<p>checksum (4 bytes): a CRC2 checksum of the content of the binary references file.</p>
-</li>
+<p>checksum (4 bytes): a CRC2 checksum of the content of the binary references  file.</p></li>
 </ul>
 <p>Immediately after the graph header, the index data is stored. The storage scheme used is the following:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>generation of all the following segments.</p>
-</li>
+<p>generation of all the following segments.</p></li>
+  
 <li>
-
-<p>number of segment to binary references mappings for the current generation.</p>
-</li>
+<p>number of segment to binary references mappings for the current generation.</p></li>
+  
 <li>
-
 <p>for each mapping we have:</p>
+  
 <ul>
-
-<li>
-
-<p>UUID of the referencing segment.</p>
-</li>
-<li>
-
-<p>number of referenced blobs.</p>
-</li>
-<li>
-
-<p>an unordered enumeration of blob ids representing blobs referenced by the current segment.</p>
-</li>
-</ul>
-</li>
+    
+<li>UUID of the referencing segment.</li>
+  </ul>
+  
+<ul>
+    
+<li>number of referenced blobs.</li>
+  </ul>
+  
+<ul>
+    
+<li>an unordered enumeration of blob ids representing blobs referenced by the  current segment.</li>
+  </ul></li>
 </ul></div>
 <div class="section">
 <h2><a name="Graph_files"></a>Graph files</h2>
 <p>The graph file represents the relationships between segments stored inside or outside the TAR file. The graph is stored as an adjacency list of UUIDs, where each UUID represents a segment. Like the binary references file, the graph file is also stored backwards.</p>
 <p>The content of the graph file is divided in two parts: a graph header and a graph adjacency list.</p>
 <p>The graph header contains the following fields:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>a magic number (4 bytes): identifies the beginning of a graph file.</p>
-</li>
+<p>a magic number (4 bytes): identifies the beginning of a graph file.</p></li>
+  
 <li>
-
-<p>size of the graph adjacency list (4 bytes): number of bytes occupied by the graph adjacency list.</p>
-</li>
+<p>size of the graph adjacency list (4 bytes): number of bytes occupied by the  graph adjacency list.</p></li>
+  
 <li>
-
-<p>number of entries (4 bytes): how many adjacency lists are stored.</p>
-</li>
+<p>number of entries (4 bytes): how many adjacency lists are stored.</p></li>
+  
 <li>
-
-<p>checksum (4 bytes): a CRC2 checksum of the content of the graph file.</p>
-</li>
+<p>checksum (4 bytes): a CRC2 checksum of the content of the graph file.</p></li>
 </ul>
 <p>Immediately after the graph header, the graph adjacency list is stored. The storage scheme used is the following:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>UUID of the source segment.</p>
-</li>
+<p>UUID of the source segment.</p></li>
+  
 <li>
-
-<p>size of the adjacency list of the source segment.</p>
-</li>
+<p>size of the adjacency list of the source segment.</p></li>
+  
 <li>
-
-<p>an unordered enumeration of UUIDs representing target segments referenced by the source segment.</p>
-</li>
+<p>an unordered enumeration of UUIDs representing target segments referenced by  the source segment.</p></li>
 </ul></div>
 <div class="section">
 <h2><a name="Index_files"></a>Index files</h2>
 <p>The index file is an ordered list of references to the entries contained in the TAR file. The references are ordered by UUID and they point to the position in the file where the entry is stored. Like the graph file, the index file is also stored backwards.</p>
 <p>The index file is divided in two parts. The first is an index header, the second contains the real data about the index.</p>
 <p>The index header contains the following fields:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>a magic number (4 bytes): identifies the beginning of an index file.</p>
-</li>
+<p>a magic number (4 bytes): identifies the beginning of an index file.</p></li>
+  
 <li>
-
-<p>size for the index (4 bytes): number of bytes occupied by the index data. This size also contains padding bytes that are added to the index to make it align with the TAR block boundary.</p>
-</li>
+<p>size for the index (4 bytes): number of bytes occupied by the index data. This  size also contains padding bytes that are added to the index to make it align  with the TAR block boundary.</p></li>
+  
 <li>
-
-<p>number of entries (4 bytes): how many entries the index contains.</p>
-</li>
+<p>number of entries (4 bytes): how many entries the index contains.</p></li>
+  
 <li>
-
-<p>checksum (4 bytes): a CRC32 checksum of the content of the index file.</p>
-</li>
+<p>checksum (4 bytes): a CRC32 checksum of the content of the index file.</p></li>
 </ul>
 <p>After the header, the content of the index starts. For every entry contained in the index, the following information is stored:</p>
-<ul>
 
+<ul>
+  
 <li>
-
-<p>the most significant bits of the UUID (8 bytes).</p>
-</li>
+<p>the most significant bits of the UUID (8 bytes).</p></li>
+  
 <li>
-
-<p>the least significant bits of the UUID (8 bytes).</p>
-</li>
+<p>the least significant bits of the UUID (8 bytes).</p></li>
+  
 <li>
-
-<p>the offset in the TAR file where the TAR entry containing the segment is located.</p>
-</li>
+<p>the offset in the TAR file where the TAR entry containing the segment is  located.</p></li>
+  
 <li>
-
-<p>the size of the entry in the TAR file.</p>
-</li>
+<p>the size of the entry in the TAR file.</p></li>
+  
 <li>
-
-<p>the generation of the entry.</p>
-</li>
+<p>the generation of the entry.</p></li>
 </ul>
 <p>Since the entries in the index are sorted by UUID, and since the UUIDs assigned to the entries are uniformly distributed, when searching an entry by its UUID an efficient algorithm called interpolation search can be used. This algorithm is a variation of binary search. While in binary search the search space (in this case, the array of entry) is halved at every iteration, interpolation search exploits the distribution of the keys to remove a portion of the search space that is potentially bigger than the half of it. Interpolation search is a more natural approximation of the way a person searches in a phone book. If the name to search begins with the letter T, in example, it makes no sense to open the phone book at the half. It is way more efficient, instead, to open the phone book close to the bottom quarter, since names starting with the letter T are more likely to be distributed in that part of the phone book.</p></div>
         </div>