You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by ch...@apache.org on 2017/03/22 12:55:39 UTC

svn commit: r1788104 - in /jackrabbit/oak/trunk/oak-doc/src/site: markdown/nodestore/document/ markdown/nodestore/document/node-bundling-file.png markdown/nodestore/document/node-bundling.md markdown/nodestore/documentmk.md site.xml

Author: chetanm
Date: Wed Mar 22 12:55:39 2017
New Revision: 1788104

URL: http://svn.apache.org/viewvc?rev=1788104&view=rev
Log:
OAK-5918 - Document enhancements in DocumentNodeStore in 1.6
OAK-1312 - Bundle nodes into a document

Document Node Bundling

Added:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png   (with props)
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
    jackrabbit/oak/trunk/oak-doc/src/site/site.xml

Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png?rev=1788104&view=auto
==============================================================================
Binary file - no diff available.

Propchange: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md?rev=1788104&view=auto
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md (added)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md Wed Mar 22 12:55:39 2017
@@ -0,0 +1,305 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+  -->
+  
+# <a name="bundling-nodes"></a> Bundling Nodes
+
+* [Bundling Nodes](#bundling-nodes)
+    * [Usage](#bundling-usage)
+        * [Bundling Pattern](#bundling-pattern)
+    * [Bundling Examples  ](#bundling-example)
+        * [Bundling nt:file           ](#bundling-usage-file)
+        * [Bundling app:Asset](#bundling-usage-file)
+    * [Design Considerations](#bundling-design-considerations)
+    * [Benefits and Limitations](#bundling-benefits-limits)
+        * [Benefits](#bundling-benefits)
+        * [Limitations](#bundling-limits)
+
+`@since Oak 1.6`
+
+Document NodeStore stores the JCR nodes as Document in underlying `DocumentStore`.
+So depending on backend that Document is stored in following way
+
+* Mongo - 1 JCR node is mapped to 1 Mongo Document in `nodes` collection
+* RDB - 1 JCR node is mapped to 1 row in `nodes` table
+
+The remaining part of the document will focus on the `MongoDocumentStore` to explain and illustrate 
+bundling concepts. 
+
+For very fine grained content with many nodes and only few properties per node it would be more efficient to bundle 
+multiple nodes into a single Mongo document. 
+Such bundling would mostly benefit reading because there are less roundtrips to the backend. 
+At the same time storage footprint would be lower because metadata overhead is per document. 
+This is specially important for the various indexes like `_id` and `_modified_1__id_1` as they would have lesser 
+entries indexed.
+
+## <a name="bundling-usage"></a> Usage
+
+Bundling is enabled on per nodetype basis. 
+Bundling definitions are defined as content in repository under `/jcr:system/rep:documentStore/bundlor`.
+ 
+    + <node type name>
+      - pattern - multi 
+      
+For example below content structure would enable bundling for nodes of type `nt:file` and `app:Asset`
+
+    + jcr:system
+      + documentstore
+        + bundlor
+          + nt:file (oak:Unstructured)
+            - pattern = ["jcr:content"]
+          + app:Asset
+            - pattern = ["jcr:content/metadata", "jcr:content/renditions", "jcr:content"]
+
+Once this is done any node of type `nt:file` created _after_ this would be stored in bundled format.
+
+* _Bundling Roots_ - Nodes having type for which bundling patterns are defined
+* _Bundling Pattern_ - Pattern defined under bundling config path which governs which all relative nodes are bundled
+* _Micro Tree_ - Refers to content structure which is bundled. Such content structures are _micro_ tree and multiple
+   such micro tree form the whole repository tree
+* _Bundling ratio_ - Ratio of number of JCR nodes bundled as part of bundling root node. For example for nt:file its 2
+
+Key points to note here
+
+1. Bundling patterns can be defined for either mixin or primaryType
+2. Bundling pattern defined for mixins take precedence over those defined for primary type
+3. Bundling only impacts content created after the bundling pattern is set
+4. Existing content would not be modified
+5. This feature can be enabled or disabled anytime. 
+6. If bundling is disable later then it would only prevent bundling of nodes created after disabling. Existing
+  bundled nodes would remain bundled
+7. Bundling pattern is baked in into the created node. So if bundling pattern is changed later it would only affect
+  new bundled roots created after the change
+8. Writes to `/jcr:system/rep:documentStore/bundlor` should be restricted to system admin as this is an important
+  configuration and any mis configuration here can have severe adverse impact on repository.
+9. While selecting bundling rule for any node node type inheritance is not considered. Bundling pattern is selected 
+   based on exact match of primary type or mixin type
+
+### <a name="bundling-pattern"></a> Bundling Pattern
+
+Bundling pattern is a multi value value property. The pattern elements are list of relative node paths which should be 
+bundled as part of _bundling root_. The relative node paths can be of following type
+
+* Static - Like 'jcr:content', 'jcr:content/metadata'. 
+* Wildcard _(Experimental Feature)_ - Like 'jcr:content/renditions/**'. This would bundle all nodes under relative 
+paths 'jcr:content/renditions'
+
+**Support for wildcard patterns is currently experimental**
+
+## <a name="bundling-example"></a> Bundling Examples  
+
+### <a name="bundling-usage-file"></a> Bundling nt:file           
+
+Lets take an example of `nt:file` node like below
+
+    + /content/book.jpg 
+      - jcr:createdBy = "admin"
+      - jcr:primaryType = "nt:file"
+      + jcr:content
+         - jcr:data = <blob id>
+         - jcr:mimeType = "text/plain"
+         - jcr:uuid = "56befaee-f5fe-4252-87f8-0dcc8a624dd5"
+         - jcr:lastModifiedBy = "admin"
+         - jcr:primaryType = "nt:resource"
+         
+![Bundling Nodes](node-bundling-file.png)
+
+This JCR node structure would be stored in Mongo in 2 documents
+ 
+    {
+           "_id"            : "2:/content/book.jpg",
+           "jcr:created" 		: {"r151ce899ac3-0-1" : "\"dat:2015-12-23T16:41:43.055+05:30\""},
+           "_modified" 		  : NumberLong(1450869100),
+           "_deleted" 			: { "r151ce899ac3-0-1" : "false"},
+           "jcr:createdBy" 	: { "r151ce899ac3-0-1" : "\"admin\""},
+           "_commitRoot" 		: { "r151ce899ac3-0-1" : "0"},
+           "_children" 		  : true,
+           "jcr:primaryType": { "r151ce899ac3-0-1" : "\"nam:nt:file\""},
+           "_modCount" 		  : NumberLong(1)
+    },
+    {
+            "_id"               : "3:/content/book.jpg/jcr:content",
+            "_bin"              : NumberLong(1),
+            "_modified"         : NumberLong(1450869100),
+            "jcr:lastModified"  : { "r151ce899ac3-0-1" : "\"dat:2015-12-23T16:41:43.056+05:30\""},
+            "_deleted"     		  : { "r151ce899ac3-0-1" : "false" },
+            "jcr:data"     		  : { "r151ce899ac3-0-1" : "\":blobId:xxx\""},
+            "_commitRoot"  		  : { "r151ce899ac3-0-1" : "0" },
+            "jcr:mimeType" 		  : { "r151ce899ac3-0-1" : "\"text/plain\""},
+            "jcr:uuid"     		  : { "r151ce899ac3-0-1" : "\"56befaee-f5fe-4252-87f8-0dcc8a624dd5\""},
+            "jcr:lastModifiedBy": { "r151ce899ac3-0-1" : "\"admin\""},
+            "jcr:primaryType" 	: { "r151ce899ac3-0-1" : "\"nam:nt:resource\""},
+            "_modCount" 		    : NumberLong(1)
+    }
+
+Now with bundling pattern like
+
+    + jcr:system
+      + documentstore
+        + bundlor
+          + nt:file (oak:Unstructured)
+            - pattern = ["jcr:content"]
+
+Would bundle the 2 nodes in nt:file node structure in same Mongo Document
+
+    {
+            "_id" 			                      : "2:/content/book.jpg",
+            "jcr:primaryType"                 : { "r15866f15753-0-1" : "\"nam:nt:file\""},
+            "jcr:content/jcr:primaryType"     : { "r15866f15753-0-1" : "\"nam:nt:resource\""},
+            "_bin"                            : NumberLong(1),
+            ":doc-pattern"                    : { "r15866f15753-0-1" : "[\"str:jcr:content\"]"},
+            "jcr:content/jcr:data"            : { "r15866f15753-0-1" : "\":blobId:xxx\""},
+            "_commitRoot"                     : { "r15866f15753-0-1" : "0" },
+            "jcr:content/jcr:uuid"            : { "r15866f15753-0-1" : "\"ee045709-81c5-4164-ba08-c03b9c61b102\""},
+            "jcr:content/jcr:lastModifiedBy"  : { "r15866f15753-0-1" : "\"admin\""},
+            "_deleted"                        : { "r15866f15753-0-1" : "false"},
+            "jcr:created"                     : { "r15866f15753-0-1" : "\"dat:2016-11-15T13:14:02.304+05:30\""},
+            "jcr:content/:doc-self-path"      : {"r15866f15753-0-1" : "\"str:jcr:content\""},
+            "jcr:createdBy"                   : {"r15866f15753-0-1" : "\"admin\""},
+            "jcr:content/jcr:lastModified"    : {"r15866f15753-0-1" : "\"dat:2016-11-15T13:14:02.305+05:30\""},
+            ":doc-has-child-bundled"          : {"r15866f15753-0-1" : "true"},
+            "jcr:content/jcr:mimeType"        : {"r15866f15753-0-1" : "\"text/plain\""},
+            "_modified"                       : NumberLong(1479195840),
+            "_modCount"                       : NumberLong(1)
+    }
+    
+So with bundling 1 nt:file would create 1 Mongo Document. 10M nt:file instance would create 10M Mongo documents 
+instead of 20M (without bundling)
+
+### <a name="bundling-usage-file"></a> Bundling app:Asset
+  
+Lets take a more complex content structure. Assume a nodetype `app:Asset` having following content
+ 
+    /content/banner.png
+      - jcr:primaryType = "app:Asset"
+      + jcr:content
+        - jcr:primaryType = "app:AssetContent"
+        + metadata
+          - status = "published"
+          + xmp
+            + 1
+              - softwareAgent = "Adobe Photoshop"
+              - author = "David"
+        + renditions (nt:folder)
+          + original (nt:file)
+            + jcr:content
+              - jcr:data = ...
+        + comments (nt:folder)
+        
+Above structure has following characteristics
+
+* It consist of **static structure** like 'jcr:content', 'jcr:content/metadata'
+* It consist of **bounded structure** like 'jcr:content/renditions'. Under renditions it can have max 5 type of nt:file node
+* It has **unbounded relative nodes** like 'jcr:content/comments' and 'jcr:content/metadata/xmp'. Nodes under these paths
+  can have unbounded content
+* Static and bounded structure take upto ~15 JCR Nodes (assuming 5 types of renditions)
+
+So 1 asset ~ 15 JCR Nodes and ~ 15 Mongo documents. Thus by default 10M assets would lead to 150M+ Mongo Documents.
+Such a structure can make use of Node Bundling to reduce this storage ratio. 
+
+Lets define a bundling pattern like below
+
+    + jcr:system
+      + documentstore
+        + bundlor
+          + nt:file (oak:Unstructured)
+            - pattern = ["jcr:content"]
+          + app:Asset
+            - pattern = ["jcr:content/metadata", "jcr:content/renditions/**", "jcr:content"]
+            
+With this bundling pattern same app:Asset structure would be stored in 1 Mongo Document excluding 'comments' and 'xmp'
+nodes
+
+    {
+      
+      "_children": true,
+      "_modified": 1469081925,
+      "_id": "2:/test/book.jpg",
+      "_commitRoot": {"r1560c1b3db8-0-1": "0"},
+      "_deleted": {"r1560c1b3db8-0-1": "false"},
+      ":pattern": {
+        "r1560c1b3db8-0-1": "[\"str:jcr:content/metadata\",\"str:jcr:content/renditions\",\"str:jcr:content/renditions/**\",\"str:jcr:content\"]"
+      },
+      "jcr:primaryType": {"r1560c1b3db8-0-1": "\"str:app:Asset\""},
+    
+      //Relative node jcr:content
+      "jcr:content/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content\""},
+      "jcr:content/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:oak:Unstructured\""},
+    
+      //Relative node jcr:content/metadata
+      "jcr:content/metadata/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/metadata\""},
+      "jcr:content/metadata/status": {"r1560c1b3db8-0-1": "\"published\""},
+      "jcr:content/metadata/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:oak:Unstructured\""},
+      
+      //Relative node jcr:content/renditions
+      "jcr:content/renditions/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/renditions\""},
+      "jcr:content/renditions/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:nt:folder\""},
+    
+      //Relative node jcr:content/renditions/original
+      "jcr:content/renditions/original/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/renditions/original\""},
+      "jcr:content/renditions/original/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:nt:file\""},
+    
+      //Relative node jcr:content/renditions/original/jcr:content
+      "jcr:content/renditions/original/jcr:content/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/renditions/original/jcr:content\""},
+      "jcr:content/renditions/original/jcr:content/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:nt:resource\""},
+      "jcr:content/renditions/original/jcr:content/jcr:data": {"r1560c1b3db8-0-1": "\"<data>\""},
+    }
+    
+
+## <a name="bundling-design-considerations"></a> Design Considerations
+
+While enabling bundling consider following points
+
+**Enable bundling only for static and bounded relative node paths**
+
+As bundled nodes are stored in single Mongo Document care must be taken such that Bundled Document size is within 
+reasonable limits otherwise Mongo (or RDB) would reject such heavy documents. So bundling pattern should only include
+those relative node paths which are static or bounded.
+ 
+For example in app:Asset it would be wrong to bundle nodes under 'jcr:content/comments' as comments can be unlimited and
+would bloat up the bundled document. However bundling nodes under 'jcr:content/renditions' should be fine as 
+application logic ensures that at max there would be 4-5 renditions nodes of type nt:file.
+
+So take into account the content structure while setting up bundling pattern.
+
+**Make use of custom mixins to mark unstructured content**
+
+If the content structure is mostly made up of nodes of type `nt:unstrcutured` or `oak:Unstructured` try to identify
+subtree which have consistent structure and define a marker mixin to mark such subtrees. Then bundling pattern can be 
+defined against such mixins
+
+For more details on how bundling is implemented refer to [OAK-1312][OAK-1312]
+
+## <a name="bundling-benefits-limits"></a> Benefits and Limitations
+
+### <a name="bundling-benefits"></a> Benefits
+
+* **Reduced latency for traversal** - If you have an structure like aap:Asset and traversal is done it would involve 
+lots of queries for child nodes as JCR level traversal is done to read any of the relative nodes like 
+'jcr:content/renditions. With bundling all those queries are avoided. 
+ 
+* **Reduced number of Documents in persistent store** - Currently for a nodetype like app:Asset where 1 app:Asset = 15 JCR Nodes. 
+If we have 10M assets then we would be consuming 150 M documents in Mongo. 
+With bundling this ratio can be reduced to say 1-5 then it would reduce actual number of documents in Mongo. Lesser 
+number of documents means lesser size for _id and {_modified, _id} index. Lesser index size would allow storing lot more
+Mongo documents as index size is key factor for sizing Mongo setups
+
+### <a name="bundling-limits"></a> Limitations
+
+Currently bundling logic has no fallback in case bundle document size exceeds the size imposed by persistent store.
+So try to ensure that bundling is limited and does not bundle lots of nodes
+
+[OAK-1312]: https://issues.apache.org/jira/browse/OAK-1312
\ No newline at end of file

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md?rev=1788104&r1=1788103&r2=1788104&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md Wed Mar 22 12:55:39 2017
@@ -18,6 +18,7 @@
 # <a name="oak-document-storage"></a> Oak Document Storage
 
 * [Oak Document Storage](#oak-document-storage)
+    * [New in 1.6](#new-1.6)
     * [Backend implementations](#backend-implementations)
     * [Content Model](#content-model)
     * [Node Content Model](#node-content-model)
@@ -25,6 +26,7 @@
     * [Clock requirements](#clock-requirements)
     * [Branches](#branches)
     * [Previous Documents](#previous-documents)
+    * [Node Bundling](#node-bundling)
     * [Background Operations](#background-operations)
         * [Renew Cluster Id Lease](#renew-cluster-id-lease)
         * [Background Document Split](#background-document-split)
@@ -46,6 +48,11 @@ The plugin implements the low level `Nod
 
 The document storage optionally uses the [persistent cache](persistent-cache.html) to reduce read operations on the backend storage.
 
+## <a name="new-1.6"></a> New in 1.6
+
+* [Node Bundling](#node-bundling)
+
+
 ## <a name="backend-implementations"></a> Backend implementations
 
 DocumentMK supports a number of backends, with a storage abstraction called `DocumentStore`:
@@ -373,7 +380,12 @@ Previous documents only contain immutabl
 committed and merged `_revisions`. This also means the previous ranges of
 committed data may overlap because branch commits are not moved to previous
 documents until the branch is merged.
- 
+
+## <a name="node-bundling"></a> node-bundling
+
+`@since Oak 1.6`
+
+Refer to [Node Bundling](document/node-bundling.html)
 
 ## <a name="background-operations"></a> Background Operations
 

Modified: jackrabbit/oak/trunk/oak-doc/src/site/site.xml
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/site.xml?rev=1788104&r1=1788103&r2=1788104&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/site.xml (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/site.xml Wed Mar 22 12:55:39 2017
@@ -40,6 +40,7 @@ under the License.
     <menu name="Features and Plugins">
       <item href="nodestore/overview.html" name="Node Storage" collapse="false">
         <item href="nodestore/documentmk.html" name="Document NodeStore" collapse="false">
+          <item href="nodestore/document/node-bundling.html" name="Node Bundling" />
           <item href="nodestore/persistent-cache.html" name="Persistent Cache" />
           <item href="clustering.html" name="Clustering" />
         </item>