You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by ch...@apache.org on 2017/03/22 12:55:39 UTC
svn commit: r1788104 - in /jackrabbit/oak/trunk/oak-doc/src/site:
markdown/nodestore/document/
markdown/nodestore/document/node-bundling-file.png
markdown/nodestore/document/node-bundling.md
markdown/nodestore/documentmk.md site.xml
Author: chetanm
Date: Wed Mar 22 12:55:39 2017
New Revision: 1788104
URL: http://svn.apache.org/viewvc?rev=1788104&view=rev
Log:
OAK-5918 - Document enhancements in DocumentNodeStore in 1.6
OAK-1312 - Bundle nodes into a document
Document Node Bundling
Added:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/
jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png (with props)
jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
jackrabbit/oak/trunk/oak-doc/src/site/site.xml
Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png?rev=1788104&view=auto
==============================================================================
Binary file - no diff available.
Propchange: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling-file.png
------------------------------------------------------------------------------
svn:mime-type = image/png
Added: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md?rev=1788104&view=auto
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md (added)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md Wed Mar 22 12:55:39 2017
@@ -0,0 +1,305 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ -->
+
+# <a name="bundling-nodes"></a> Bundling Nodes
+
+* [Bundling Nodes](#bundling-nodes)
+ * [Usage](#bundling-usage)
+ * [Bundling Pattern](#bundling-pattern)
+ * [Bundling Examples ](#bundling-example)
+ * [Bundling nt:file ](#bundling-usage-file)
+ * [Bundling app:Asset](#bundling-usage-file)
+ * [Design Considerations](#bundling-design-considerations)
+ * [Benefits and Limitations](#bundling-benefits-limits)
+ * [Benefits](#bundling-benefits)
+ * [Limitations](#bundling-limits)
+
+`@since Oak 1.6`
+
+Document NodeStore stores the JCR nodes as Document in underlying `DocumentStore`.
+So depending on backend that Document is stored in following way
+
+* Mongo - 1 JCR node is mapped to 1 Mongo Document in `nodes` collection
+* RDB - 1 JCR node is mapped to 1 row in `nodes` table
+
+The remaining part of the document will focus on the `MongoDocumentStore` to explain and illustrate
+bundling concepts.
+
+For very fine grained content with many nodes and only few properties per node it would be more efficient to bundle
+multiple nodes into a single Mongo document.
+Such bundling would mostly benefit reading because there are less roundtrips to the backend.
+At the same time storage footprint would be lower because metadata overhead is per document.
+This is specially important for the various indexes like `_id` and `_modified_1__id_1` as they would have lesser
+entries indexed.
+
+## <a name="bundling-usage"></a> Usage
+
+Bundling is enabled on per nodetype basis.
+Bundling definitions are defined as content in repository under `/jcr:system/rep:documentStore/bundlor`.
+
+ + <node type name>
+ - pattern - multi
+
+For example below content structure would enable bundling for nodes of type `nt:file` and `app:Asset`
+
+ + jcr:system
+ + documentstore
+ + bundlor
+ + nt:file (oak:Unstructured)
+ - pattern = ["jcr:content"]
+ + app:Asset
+ - pattern = ["jcr:content/metadata", "jcr:content/renditions", "jcr:content"]
+
+Once this is done any node of type `nt:file` created _after_ this would be stored in bundled format.
+
+* _Bundling Roots_ - Nodes having type for which bundling patterns are defined
+* _Bundling Pattern_ - Pattern defined under bundling config path which governs which all relative nodes are bundled
+* _Micro Tree_ - Refers to content structure which is bundled. Such content structures are _micro_ tree and multiple
+ such micro tree form the whole repository tree
+* _Bundling ratio_ - Ratio of number of JCR nodes bundled as part of bundling root node. For example for nt:file its 2
+
+Key points to note here
+
+1. Bundling patterns can be defined for either mixin or primaryType
+2. Bundling pattern defined for mixins take precedence over those defined for primary type
+3. Bundling only impacts content created after the bundling pattern is set
+4. Existing content would not be modified
+5. This feature can be enabled or disabled anytime.
+6. If bundling is disable later then it would only prevent bundling of nodes created after disabling. Existing
+ bundled nodes would remain bundled
+7. Bundling pattern is baked in into the created node. So if bundling pattern is changed later it would only affect
+ new bundled roots created after the change
+8. Writes to `/jcr:system/rep:documentStore/bundlor` should be restricted to system admin as this is an important
+ configuration and any mis configuration here can have severe adverse impact on repository.
+9. While selecting bundling rule for any node node type inheritance is not considered. Bundling pattern is selected
+ based on exact match of primary type or mixin type
+
+### <a name="bundling-pattern"></a> Bundling Pattern
+
+Bundling pattern is a multi value value property. The pattern elements are list of relative node paths which should be
+bundled as part of _bundling root_. The relative node paths can be of following type
+
+* Static - Like 'jcr:content', 'jcr:content/metadata'.
+* Wildcard _(Experimental Feature)_ - Like 'jcr:content/renditions/**'. This would bundle all nodes under relative
+paths 'jcr:content/renditions'
+
+**Support for wildcard patterns is currently experimental**
+
+## <a name="bundling-example"></a> Bundling Examples
+
+### <a name="bundling-usage-file"></a> Bundling nt:file
+
+Lets take an example of `nt:file` node like below
+
+ + /content/book.jpg
+ - jcr:createdBy = "admin"
+ - jcr:primaryType = "nt:file"
+ + jcr:content
+ - jcr:data = <blob id>
+ - jcr:mimeType = "text/plain"
+ - jcr:uuid = "56befaee-f5fe-4252-87f8-0dcc8a624dd5"
+ - jcr:lastModifiedBy = "admin"
+ - jcr:primaryType = "nt:resource"
+
+![Bundling Nodes](node-bundling-file.png)
+
+This JCR node structure would be stored in Mongo in 2 documents
+
+ {
+ "_id" : "2:/content/book.jpg",
+ "jcr:created" : {"r151ce899ac3-0-1" : "\"dat:2015-12-23T16:41:43.055+05:30\""},
+ "_modified" : NumberLong(1450869100),
+ "_deleted" : { "r151ce899ac3-0-1" : "false"},
+ "jcr:createdBy" : { "r151ce899ac3-0-1" : "\"admin\""},
+ "_commitRoot" : { "r151ce899ac3-0-1" : "0"},
+ "_children" : true,
+ "jcr:primaryType": { "r151ce899ac3-0-1" : "\"nam:nt:file\""},
+ "_modCount" : NumberLong(1)
+ },
+ {
+ "_id" : "3:/content/book.jpg/jcr:content",
+ "_bin" : NumberLong(1),
+ "_modified" : NumberLong(1450869100),
+ "jcr:lastModified" : { "r151ce899ac3-0-1" : "\"dat:2015-12-23T16:41:43.056+05:30\""},
+ "_deleted" : { "r151ce899ac3-0-1" : "false" },
+ "jcr:data" : { "r151ce899ac3-0-1" : "\":blobId:xxx\""},
+ "_commitRoot" : { "r151ce899ac3-0-1" : "0" },
+ "jcr:mimeType" : { "r151ce899ac3-0-1" : "\"text/plain\""},
+ "jcr:uuid" : { "r151ce899ac3-0-1" : "\"56befaee-f5fe-4252-87f8-0dcc8a624dd5\""},
+ "jcr:lastModifiedBy": { "r151ce899ac3-0-1" : "\"admin\""},
+ "jcr:primaryType" : { "r151ce899ac3-0-1" : "\"nam:nt:resource\""},
+ "_modCount" : NumberLong(1)
+ }
+
+Now with bundling pattern like
+
+ + jcr:system
+ + documentstore
+ + bundlor
+ + nt:file (oak:Unstructured)
+ - pattern = ["jcr:content"]
+
+Would bundle the 2 nodes in nt:file node structure in same Mongo Document
+
+ {
+ "_id" : "2:/content/book.jpg",
+ "jcr:primaryType" : { "r15866f15753-0-1" : "\"nam:nt:file\""},
+ "jcr:content/jcr:primaryType" : { "r15866f15753-0-1" : "\"nam:nt:resource\""},
+ "_bin" : NumberLong(1),
+ ":doc-pattern" : { "r15866f15753-0-1" : "[\"str:jcr:content\"]"},
+ "jcr:content/jcr:data" : { "r15866f15753-0-1" : "\":blobId:xxx\""},
+ "_commitRoot" : { "r15866f15753-0-1" : "0" },
+ "jcr:content/jcr:uuid" : { "r15866f15753-0-1" : "\"ee045709-81c5-4164-ba08-c03b9c61b102\""},
+ "jcr:content/jcr:lastModifiedBy" : { "r15866f15753-0-1" : "\"admin\""},
+ "_deleted" : { "r15866f15753-0-1" : "false"},
+ "jcr:created" : { "r15866f15753-0-1" : "\"dat:2016-11-15T13:14:02.304+05:30\""},
+ "jcr:content/:doc-self-path" : {"r15866f15753-0-1" : "\"str:jcr:content\""},
+ "jcr:createdBy" : {"r15866f15753-0-1" : "\"admin\""},
+ "jcr:content/jcr:lastModified" : {"r15866f15753-0-1" : "\"dat:2016-11-15T13:14:02.305+05:30\""},
+ ":doc-has-child-bundled" : {"r15866f15753-0-1" : "true"},
+ "jcr:content/jcr:mimeType" : {"r15866f15753-0-1" : "\"text/plain\""},
+ "_modified" : NumberLong(1479195840),
+ "_modCount" : NumberLong(1)
+ }
+
+So with bundling 1 nt:file would create 1 Mongo Document. 10M nt:file instance would create 10M Mongo documents
+instead of 20M (without bundling)
+
+### <a name="bundling-usage-file"></a> Bundling app:Asset
+
+Lets take a more complex content structure. Assume a nodetype `app:Asset` having following content
+
+ /content/banner.png
+ - jcr:primaryType = "app:Asset"
+ + jcr:content
+ - jcr:primaryType = "app:AssetContent"
+ + metadata
+ - status = "published"
+ + xmp
+ + 1
+ - softwareAgent = "Adobe Photoshop"
+ - author = "David"
+ + renditions (nt:folder)
+ + original (nt:file)
+ + jcr:content
+ - jcr:data = ...
+ + comments (nt:folder)
+
+Above structure has following characteristics
+
+* It consist of **static structure** like 'jcr:content', 'jcr:content/metadata'
+* It consist of **bounded structure** like 'jcr:content/renditions'. Under renditions it can have max 5 type of nt:file node
+* It has **unbounded relative nodes** like 'jcr:content/comments' and 'jcr:content/metadata/xmp'. Nodes under these paths
+ can have unbounded content
+* Static and bounded structure take upto ~15 JCR Nodes (assuming 5 types of renditions)
+
+So 1 asset ~ 15 JCR Nodes and ~ 15 Mongo documents. Thus by default 10M assets would lead to 150M+ Mongo Documents.
+Such a structure can make use of Node Bundling to reduce this storage ratio.
+
+Lets define a bundling pattern like below
+
+ + jcr:system
+ + documentstore
+ + bundlor
+ + nt:file (oak:Unstructured)
+ - pattern = ["jcr:content"]
+ + app:Asset
+ - pattern = ["jcr:content/metadata", "jcr:content/renditions/**", "jcr:content"]
+
+With this bundling pattern same app:Asset structure would be stored in 1 Mongo Document excluding 'comments' and 'xmp'
+nodes
+
+ {
+
+ "_children": true,
+ "_modified": 1469081925,
+ "_id": "2:/test/book.jpg",
+ "_commitRoot": {"r1560c1b3db8-0-1": "0"},
+ "_deleted": {"r1560c1b3db8-0-1": "false"},
+ ":pattern": {
+ "r1560c1b3db8-0-1": "[\"str:jcr:content/metadata\",\"str:jcr:content/renditions\",\"str:jcr:content/renditions/**\",\"str:jcr:content\"]"
+ },
+ "jcr:primaryType": {"r1560c1b3db8-0-1": "\"str:app:Asset\""},
+
+ //Relative node jcr:content
+ "jcr:content/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content\""},
+ "jcr:content/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:oak:Unstructured\""},
+
+ //Relative node jcr:content/metadata
+ "jcr:content/metadata/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/metadata\""},
+ "jcr:content/metadata/status": {"r1560c1b3db8-0-1": "\"published\""},
+ "jcr:content/metadata/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:oak:Unstructured\""},
+
+ //Relative node jcr:content/renditions
+ "jcr:content/renditions/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/renditions\""},
+ "jcr:content/renditions/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:nt:folder\""},
+
+ //Relative node jcr:content/renditions/original
+ "jcr:content/renditions/original/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/renditions/original\""},
+ "jcr:content/renditions/original/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:nt:file\""},
+
+ //Relative node jcr:content/renditions/original/jcr:content
+ "jcr:content/renditions/original/jcr:content/:doc-self-path": {"r1560c1b3db8-0-1" : "\"str:jcr:content/renditions/original/jcr:content\""},
+ "jcr:content/renditions/original/jcr:content/jcr:primaryType": {"r1560c1b3db8-0-1": "\"nam:nt:resource\""},
+ "jcr:content/renditions/original/jcr:content/jcr:data": {"r1560c1b3db8-0-1": "\"<data>\""},
+ }
+
+
+## <a name="bundling-design-considerations"></a> Design Considerations
+
+While enabling bundling consider following points
+
+**Enable bundling only for static and bounded relative node paths**
+
+As bundled nodes are stored in single Mongo Document care must be taken such that Bundled Document size is within
+reasonable limits otherwise Mongo (or RDB) would reject such heavy documents. So bundling pattern should only include
+those relative node paths which are static or bounded.
+
+For example in app:Asset it would be wrong to bundle nodes under 'jcr:content/comments' as comments can be unlimited and
+would bloat up the bundled document. However bundling nodes under 'jcr:content/renditions' should be fine as
+application logic ensures that at max there would be 4-5 renditions nodes of type nt:file.
+
+So take into account the content structure while setting up bundling pattern.
+
+**Make use of custom mixins to mark unstructured content**
+
+If the content structure is mostly made up of nodes of type `nt:unstrcutured` or `oak:Unstructured` try to identify
+subtree which have consistent structure and define a marker mixin to mark such subtrees. Then bundling pattern can be
+defined against such mixins
+
+For more details on how bundling is implemented refer to [OAK-1312][OAK-1312]
+
+## <a name="bundling-benefits-limits"></a> Benefits and Limitations
+
+### <a name="bundling-benefits"></a> Benefits
+
+* **Reduced latency for traversal** - If you have an structure like aap:Asset and traversal is done it would involve
+lots of queries for child nodes as JCR level traversal is done to read any of the relative nodes like
+'jcr:content/renditions. With bundling all those queries are avoided.
+
+* **Reduced number of Documents in persistent store** - Currently for a nodetype like app:Asset where 1 app:Asset = 15 JCR Nodes.
+If we have 10M assets then we would be consuming 150 M documents in Mongo.
+With bundling this ratio can be reduced to say 1-5 then it would reduce actual number of documents in Mongo. Lesser
+number of documents means lesser size for _id and {_modified, _id} index. Lesser index size would allow storing lot more
+Mongo documents as index size is key factor for sizing Mongo setups
+
+### <a name="bundling-limits"></a> Limitations
+
+Currently bundling logic has no fallback in case bundle document size exceeds the size imposed by persistent store.
+So try to ensure that bundling is limited and does not bundle lots of nodes
+
+[OAK-1312]: https://issues.apache.org/jira/browse/OAK-1312
\ No newline at end of file
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md?rev=1788104&r1=1788103&r2=1788104&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md Wed Mar 22 12:55:39 2017
@@ -18,6 +18,7 @@
# <a name="oak-document-storage"></a> Oak Document Storage
* [Oak Document Storage](#oak-document-storage)
+ * [New in 1.6](#new-1.6)
* [Backend implementations](#backend-implementations)
* [Content Model](#content-model)
* [Node Content Model](#node-content-model)
@@ -25,6 +26,7 @@
* [Clock requirements](#clock-requirements)
* [Branches](#branches)
* [Previous Documents](#previous-documents)
+ * [Node Bundling](#node-bundling)
* [Background Operations](#background-operations)
* [Renew Cluster Id Lease](#renew-cluster-id-lease)
* [Background Document Split](#background-document-split)
@@ -46,6 +48,11 @@ The plugin implements the low level `Nod
The document storage optionally uses the [persistent cache](persistent-cache.html) to reduce read operations on the backend storage.
+## <a name="new-1.6"></a> New in 1.6
+
+* [Node Bundling](#node-bundling)
+
+
## <a name="backend-implementations"></a> Backend implementations
DocumentMK supports a number of backends, with a storage abstraction called `DocumentStore`:
@@ -373,7 +380,12 @@ Previous documents only contain immutabl
committed and merged `_revisions`. This also means the previous ranges of
committed data may overlap because branch commits are not moved to previous
documents until the branch is merged.
-
+
+## <a name="node-bundling"></a> node-bundling
+
+`@since Oak 1.6`
+
+Refer to [Node Bundling](document/node-bundling.html)
## <a name="background-operations"></a> Background Operations
Modified: jackrabbit/oak/trunk/oak-doc/src/site/site.xml
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/site.xml?rev=1788104&r1=1788103&r2=1788104&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/site.xml (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/site.xml Wed Mar 22 12:55:39 2017
@@ -40,6 +40,7 @@ under the License.
<menu name="Features and Plugins">
<item href="nodestore/overview.html" name="Node Storage" collapse="false">
<item href="nodestore/documentmk.html" name="Document NodeStore" collapse="false">
+ <item href="nodestore/document/node-bundling.html" name="Node Bundling" />
<item href="nodestore/persistent-cache.html" name="Persistent Cache" />
<item href="clustering.html" name="Clustering" />
</item>