You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by ch...@apache.org on 2015/03/31 08:05:45 UTC

svn commit: r1670264 - in /jackrabbit/site/live/oak/docs: nodestore/documentmk.html osgi_config.html

Author: chetanm
Date: Tue Mar 31 06:05:45 2015
New Revision: 1670264

URL: http://svn.apache.org/r1670264
Log:
OAK-301- Document Oak

Document details around various caches used in Oak

Modified:
    jackrabbit/site/live/oak/docs/nodestore/documentmk.html
    jackrabbit/site/live/oak/docs/osgi_config.html

Modified: jackrabbit/site/live/oak/docs/nodestore/documentmk.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/nodestore/documentmk.html?rev=1670264&r1=1670263&r2=1670264&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/nodestore/documentmk.html (original)
+++ jackrabbit/site/live/oak/docs/nodestore/documentmk.html Tue Mar 31 06:05:45 2015
@@ -700,7 +700,8 @@ nodes
 <p>DocumentMK periodically checks documents for their size and if necessary splits them up and moves old data to a previous document. This is done in the background by each DocumentMK instance for the data it created.</p></div>
 <div class="section">
 <h3>Background Writes<a name="Background_Writes"></a></h3>
-<p>While performing commits there are certain nodes which are modified but do not become part of commit. For example when a node under /a/b/c is updated then the <tt>_lastRev</tt> property of all ancestors also need to be updated to the commit revision. Such changes are accumulated and flushed periodically through a asynchronous job.</p></div>
+<p>While performing commits there are certain nodes which are modified but do not become part of commit. For example when a node under /a/b/c is updated then the <tt>_lastRev</tt> property of all ancestors also need to be updated to the commit revision. Such changes are accumulated and flushed periodically through a asynchronous job.</p>
+<p><a name="bg-read"></a></p></div>
 <div class="section">
 <h3>Background Reads<a name="Background_Reads"></a></h3>
 <p>DocumentMK periodically picks up changes from other DocumentMK instances by polling the root node for changes of <tt>_lastRev</tt>. This happens once every second.</p></div></div>
@@ -755,7 +756,56 @@ nodes
 <pre>&gt; db.clusterNodes.update({}, 
   {$set: {readWriteMode:'readPreference=primary&amp;w=majority'}}, 
   {multi: true})    
-</pre></div></div></div></div>
+</pre></div>
+<p><a name="cache"></a></p></div></div></div>
+<div class="section">
+<h2>Caching<a name="Caching"></a></h2>
+<p><tt>DocumentNodeStore</tt> maintains multiple caches to provide an optimum performance. By default the cached instances are kept in heap memory but some of the caches can be backed by <a href="persistent-cache.html">persistent cache</a>.</p>
+
+<ol style="list-style-type: decimal">
+  
+<li>
+<p><tt>documentCache</tt> - Document cache is used for caching the <tt>NodeDocument</tt> instance. These are in memory representation of the persistent state. For example in case of Mongo it maps to the Mongo document in <tt>nodes</tt> collection and for RDB its maps to the row in <tt>NODES</tt> table.</p>
+<p>Depending on the <tt>DocumentStore</tt> implementation different heuristics are applied for invalidating the cache entries based on changes in backend </p></li>
+  
+<li>
+<p><tt>docChildrenCache</tt> - Document Children cache is used to cache the children state for a given parent node. This is invalidated completely upon every background read</p></li>
+  
+<li>
+<p><tt>nodeCache</tt> - Node cache is used to cache the <tt>DocumentNodeState</tt> instances. These are <b>immutable</b> view of <tt>NodeDocument</tt> as seen at a given revision hence no consistency checks are to be performed for them</p></li>
+  
+<li>
+<p><tt>childrenCache</tt> - Children cache is used to cache the children for a given node. These are also <b>immutable</b> and represent the state of children for a given parent at certain revision</p></li>
+  
+<li>
+<p><tt>diffCache</tt> - Caches the diff for the changes done between successive revision.  For local changes done the diff is add to the cache upon commit while for  external changes the diff entries are added upon computation of diff as part  of observation call</p></li>
+</ol>
+<p>All the above caches are managed on heap. For the last 3 <tt>nodeCache</tt>, <tt>childrenCache</tt> and <tt>diffCache</tt> Oak provides support for <a href="persistent-cache.html">persistent cache</a>. By enabling the persistent cache feature Oak can manage a much larger cache off heap and thus avoid freeing up heap memory for application usage.</p>
+<div class="section">
+<h3>Cache Invalidation<a name="Cache_Invalidation"></a></h3>
+<p><tt>documentCache</tt> and <tt>docChildrenCache</tt> are containing mutable state which requires consistency checks to be performed to keep them in sync with the backend persisted state. Oak uses a MVCC model under which it maintains a consistent view of a given Node at a given revision. This allows using local cache instead of using a global clustered cache where changes made by any other cluster node need not be instantly reflected on all other nodes. </p>
+<p>Each cluster node periodically performs <a href="#bg-read">background reads</a> to pickup changes done by other cluster nodes. At that time it performs <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1156">consistency check</a> to ensure that cached instance state reflect the state in the backend persisted state. Performing the check would take some time would be proportional number of entries present in the cache. </p>
+<p>For repository to work properly its important to ensure that such background reads do not consume much time and <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2646">work is underway</a> to improve current approach. <i>To ensure that such background operation (which include the cache invalidation checks) perform quickly one should not set a large size for these caches</i>.</p>
+<p>All other caches consist of immutable state and hence no cache invalidation needs to be performed for them. For that reason those caches can be backed by persistent cache and even having large number of entries in such caches would not be a matter of concern. </p></div>
+<div class="section">
+<h3>Cache Configuration<a name="Cache_Configuration"></a></h3>
+<p>In a default setup the <a href="../osgi_config.html#document-node-store">DocumentNodeStoreService</a> takes a single config for <tt>cache</tt> which is internally distributed among the various caches above in following way</p>
+
+<ol style="list-style-type: decimal">
+  
+<li><tt>nodeCache</tt> - 25%</li>
+  
+<li><tt>childrenCache</tt> - 10%</li>
+  
+<li><tt>docChildrenCache</tt> - 3%</li>
+  
+<li><tt>diffCache</tt> - 5%</li>
+  
+<li><tt>documentCache</tt> - Is given the rest i.e. 57%</li>
+</ol>
+<p>Lately <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2546">options are provided</a> to have a fine grained control over the distribution. See <a href="../osgi_config.html#cache-allocation">Cache Allocation</a></p>
+<p>While distributing ensure that cache left for <tt>documentCache</tt> is not very large i.e. prefer to keep that ~500 MB max or lower. As a large <tt>documentCache</tt> can lead to increase in the time taken to perform cache invalidation.</p>
+<p>Further make use of the persistent cache. This reduces pressure on GC by keeping instances off heap with slight decrease in performance compared to keeping them on heap.</p></div></div>
                   </div>
             </div>
           </div>

Modified: jackrabbit/site/live/oak/docs/osgi_config.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/osgi_config.html?rev=1670264&r1=1670263&r2=1670264&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/osgi_config.html (original)
+++ jackrabbit/site/live/oak/docs/osgi_config.html Tue Mar 31 06:05:45 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-03-25
+ | Generated by Apache Maven Doxia at 2015-03-31
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150325" />
+    <meta name="Date-Revision-yyyymmdd" content="20150331" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak - Repository OSGi Configuration</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
@@ -195,7 +195,7 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2015-03-25</li>
+                  <li id="publishDate">Last Published: 2015-03-31</li>
                   <li class="divider">|</li> <li id="projectVersion">Version: 1.1-SNAPSHOT</li>
                       
                 
@@ -504,7 +504,20 @@
 <dd>DocumentNodeStore when running with Mongo would use <tt>MongoBlobStore</tt> by default unless a custom <tt>BlobStore</tt> is  configured. In such scenario the size of in memory cache for the frequently used blobs can be configured via  <tt>blobCacheSize</tt>.</dd>
 <dt>persistentCache</dt>
 <dd>Default &quot;&quot; (an empty string, meaning disabled)</dd>
-<dd>The persistent cache, which is stored in the local file system.</dd>
+<dd>The <a href="./nodestore/persistent-cache.html">persistent cache</a>, which is stored in the local file system.</dd>
+<dt><a name="cache-allocation"></a></dt>
+<dt>nodeCachePercentage</dt>
+<dd>Default 25</dd>
+<dd>Percentage of <tt>cache</tt> allocated for <tt>nodeCache</tt>. See <a href="./nodestore/documentmk.html#cache">Caching</a></dd>
+<dt>childrenCachePercentage</dt>
+<dd>Default 10</dd>
+<dd>Percentage of <tt>cache</tt> allocated for <tt>childrenCache</tt>. See <a href="./nodestore/documentmk.html#cache">Caching</a></dd>
+<dt>diffCachePercentage</dt>
+<dd>Default 5</dd>
+<dd>Percentage of <tt>cache</tt> allocated for <tt>diffCache</tt>. See <a href="./nodestore/documentmk.html#cache">Caching</a></dd>
+<dt>docChildrenCachePercentage</dt>
+<dd>Default 3</dd>
+<dd>Percentage of <tt>cache</tt> allocated for <tt>docChildrenCache</tt>. See <a href="./nodestore/documentmk.html#cache">Caching</a></dd>
 </dl>
 <p>Example config file</p>
 
@@ -526,7 +539,7 @@ db=oak
 </pre></div></li>
   
 <li>
-<p><b>Read Preferences and Write Concern</b> - These also can be spcified as part of Mongo URI. Refer to  <a href="documentmk.html#rw-preference">Read Preference and Write Concern</a> section for more details. For  e.g. following would set <i>readPreference</i> to <i>secondary</i> and prefer replica with tag <i>dc:ny,rack:1</i>.  It would also specify the write timeout to 10 sec</p>
+<p><b>Read Preferences and Write Concern</b> - These also can be spcified as part of Mongo URI. Refer to  <a href="./nodestore/documentmk.html#rw-preference">Read Preference and Write Concern</a> section for more details. For  e.g. following would set <i>readPreference</i> to <i>secondary</i> and prefer replica with tag <i>dc:ny,rack:1</i>.  It would also specify the write timeout to 10 sec</p>
   
 <div class="source">
 <pre>mongodb://db1.example.net,db2.example.com?readPreference=secondary&amp;readPreferenceTags=dc:ny,rack:1&amp;readPreferenceTags=dc:ny&amp;readPreferenceTags=&amp;w=1&amp;wtimeoutMS=10000