You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by mr...@apache.org on 2017/03/23 08:24:50 UTC

svn commit: r1788190 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md

Author: mreutegg
Date: Thu Mar 23 08:24:50 2017
New Revision: 1788190

URL: http://svn.apache.org/viewvc?rev=1788190&view=rev
Log:
OAK-5918: Document enhancements in DocumentNodeStore in 1.6

Minor corrections

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md?rev=1788190&r1=1788189&r2=1788190&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md Thu Mar 23 08:24:50 2017
@@ -33,28 +33,28 @@
 Document NodeStore stores the JCR nodes as Document in underlying `DocumentStore`.
 So depending on backend that Document is stored in following way
 
-* Mongo - 1 JCR node is mapped to 1 Mongo Document in `nodes` collection
+* MongoDB - 1 JCR node is mapped to 1 MongoDB Document in `nodes` collection
 * RDB - 1 JCR node is mapped to 1 row in `nodes` table
 
 The remaining part of the document will focus on the `MongoDocumentStore` to explain and illustrate 
 bundling concepts. 
 
-For very fine grained content with many nodes and only few properties per node it would be more efficient to bundle 
-multiple nodes into a single Mongo document. 
-Such bundling would mostly benefit reading because there are less roundtrips to the backend. 
-At the same time storage footprint would be lower because metadata overhead is per document. 
-This is specially important for the various indexes like `_id` and `_modified_1__id_1` as they would have lesser 
+For very fine grained content with many nodes and only few properties per node it is more efficient to bundle 
+multiple nodes into a single MongoDB document. 
+Such bundling mostly benefits reading because there are less round-trips to the backend. 
+At the same time storage footprint is lower because metadata overhead is per document. 
+This is specially important for the various indexes like `_id` and `_modified_1__id_1` as they have less 
 entries indexed.
 
 ## <a name="bundling-usage"></a> Usage
 
 Bundling is enabled on per nodetype basis. 
-Bundling definitions are defined as content in repository under `/jcr:system/rep:documentStore/bundlor`.
+Bundling definitions are defined as content in the repository under `/jcr:system/rep:documentStore/bundlor`.
  
     + <node type name>
       - pattern - multi 
       
-For example below content structure would enable bundling for nodes of type `nt:file` and `app:Asset`
+For example below content structure enables bundling for nodes of type `nt:file` and `app:Asset`
 
     + jcr:system
       + documentstore
@@ -64,7 +64,9 @@ For example below content structure woul
           + app:Asset
             - pattern = ["jcr:content/metadata", "jcr:content/renditions", "jcr:content"]
 
-Once this is done any node of type `nt:file` created _after_ this would be stored in bundled format.
+Once this is done any node of type `nt:file` created _after_ this will be stored
+in bundled format. Nodes created _before_ the configuration was added are not
+affected and their underlying documents are not rewritten.
 
 * _Bundling Roots_ - Nodes having type for which bundling patterns are defined
 * _Bundling Pattern_ - Pattern defined under bundling config path which governs which all relative nodes are bundled
@@ -74,24 +76,24 @@ Once this is done any node of type `nt:f
 
 Key points to note here
 
-1. Bundling patterns can be defined for either mixin or primaryType
-2. Bundling pattern defined for mixins take precedence over those defined for primary type
-3. Bundling only impacts content created after the bundling pattern is set
-4. Existing content would not be modified
+1. Bundling patterns can be defined for either jcr:mixinTypes or jcr:primaryType.
+2. Bundling pattern defined for mixins take precedence over those defined for primary node type.
+3. Bundling only impacts content created after the bundling pattern is set.
+4. Existing content is not modified.
 5. This feature can be enabled or disabled anytime. 
-6. If bundling is disable later then it would only prevent bundling of nodes created after disabling. Existing
-  bundled nodes would remain bundled
-7. Bundling pattern is baked in into the created node. So if bundling pattern is changed later it would only affect
-  new bundled roots created after the change
+6. If bundling is disable later then it only prevents bundling of nodes created after disabling. Existing
+  bundled nodes remain bundled.
+7. Bundling pattern is baked in into the created node. So if bundling pattern is changed later, it only affects
+  new bundled roots created after the change.
 8. Writes to `/jcr:system/rep:documentStore/bundlor` should be restricted to system admin as this is an important
   configuration and any mis configuration here can have severe adverse impact on repository.
-9. While selecting bundling rule for any node node type inheritance is not considered. Bundling pattern is selected 
-   based on exact match of primary type or mixin type
+9. While selecting bundling rule for any node node type inheritance is _not_ considered. Bundling pattern is selected 
+   based on exact match of jcr:mixinTypes or jcr:primaryType names.
 
 ### <a name="bundling-pattern"></a> Bundling Pattern
 
 Bundling pattern is a multi value value property. The pattern elements are list of relative node paths which should be 
-bundled as part of _bundling root_. The relative node paths can be of following type
+bundled as part of _bundling root_. The relative node paths can be of following type:
 
 * Static - Like 'jcr:content', 'jcr:content/metadata'. 
 * Wildcard _(Experimental Feature)_ - Like 'jcr:content/renditions/**'. This would bundle all nodes under relative 
@@ -117,7 +119,7 @@ Lets take an example of `nt:file` node l
          
 ![Bundling Nodes](node-bundling-file.png)
 
-This JCR node structure would be stored in Mongo in 2 documents
+This JCR node structure would be stored in MongoDB in 2 documents
  
     {
            "_id"            : "2:/content/book.jpg",
@@ -153,7 +155,7 @@ Now with bundling pattern like
           + nt:file (oak:Unstructured)
             - pattern = ["jcr:content"]
 
-Would bundle the 2 nodes in nt:file node structure in same Mongo Document
+Would bundle the 2 nodes in nt:file node structure in same MongoDB Document
 
     {
             "_id" 			                      : "2:/content/book.jpg",
@@ -176,7 +178,7 @@ Would bundle the 2 nodes in nt:file node
             "_modCount"                       : NumberLong(1)
     }
     
-So with bundling 1 nt:file would create 1 Mongo Document. 10M nt:file instance would create 10M Mongo documents 
+So with bundling 1 nt:file would create 1 MongoDB Document. 10M nt:file instance would create 10M MongoDB documents 
 instead of 20M (without bundling)
 
 ### <a name="bundling-usage-file"></a> Bundling app:Asset
@@ -207,7 +209,7 @@ Above structure has following characteri
   can have unbounded content
 * Static and bounded structure take upto ~15 JCR Nodes (assuming 5 types of renditions)
 
-So 1 asset ~ 15 JCR Nodes and ~ 15 Mongo documents. Thus by default 10M assets would lead to 150M+ Mongo Documents.
+So 1 asset ~ 15 JCR Nodes and ~ 15 MongoDB documents. Thus by default 10M assets would lead to 150M+ MongoDB Documents.
 Such a structure can make use of Node Bundling to reduce this storage ratio. 
 
 Lets define a bundling pattern like below
@@ -220,7 +222,7 @@ Lets define a bundling pattern like belo
           + app:Asset
             - pattern = ["jcr:content/metadata", "jcr:content/renditions/**", "jcr:content"]
             
-With this bundling pattern same app:Asset structure would be stored in 1 Mongo Document excluding 'comments' and 'xmp'
+With this bundling pattern same app:Asset structure would be stored in 1 MongoDB Document excluding 'comments' and 'xmp'
 nodes
 
     {
@@ -261,12 +263,12 @@ nodes
 
 ## <a name="bundling-design-considerations"></a> Design Considerations
 
-While enabling bundling consider following points
+While enabling bundling consider following points:
 
 **Enable bundling only for static and bounded relative node paths**
 
-As bundled nodes are stored in single Mongo Document care must be taken such that Bundled Document size is within 
-reasonable limits otherwise Mongo (or RDB) would reject such heavy documents. So bundling pattern should only include
+As bundled nodes are stored in single MongoDB Document care must be taken such that bundled Document size is within 
+reasonable limits otherwise MongoDB (or RDB) would reject such heavy documents. So bundling pattern should only include
 those relative node paths which are static or bounded.
  
 For example in app:Asset it would be wrong to bundle nodes under 'jcr:content/comments' as comments can be unlimited and
@@ -279,7 +281,7 @@ So take into account the content structu
 
 If the content structure is mostly made up of nodes of type `nt:unstrcutured` or `oak:Unstructured` try to identify
 subtree which have consistent structure and define a marker mixin to mark such subtrees. Then bundling pattern can be 
-defined against such mixins
+defined against such mixins.
 
 For more details on how bundling is implemented refer to [OAK-1312][OAK-1312]
 
@@ -292,14 +294,14 @@ lots of queries for child nodes as JCR l
 'jcr:content/renditions. With bundling all those queries are avoided. 
  
 * **Reduced number of Documents in persistent store** - Currently for a nodetype like app:Asset where 1 app:Asset = 15 JCR Nodes. 
-If we have 10M assets then we would be consuming 150 M documents in Mongo. 
-With bundling this ratio can be reduced to say 1-5 then it would reduce actual number of documents in Mongo. Lesser 
-number of documents means lesser size for _id and {_modified, _id} index. Lesser index size would allow storing lot more
-Mongo documents as index size is key factor for sizing Mongo setups
+If we have 10M assets then we would be consuming 150 M documents in MongoDB. 
+With bundling this ratio can be reduced to say 1-5 then it would reduce actual number of documents in Mongo.
+Fewer documents means reduces size for _id and {_modified, _id} index. Reduced index size allows storing a lot more
+MongoDB documents as index size is key factor for sizing MongoDB setups.
 
 ### <a name="bundling-limits"></a> Limitations
 
 Currently bundling logic has no fallback in case bundle document size exceeds the size imposed by persistent store.
-So try to ensure that bundling is limited and does not bundle lots of nodes
+So try to ensure that bundling is limited and does not bundle lots of nodes.
 
 [OAK-1312]: https://issues.apache.org/jira/browse/OAK-1312
\ No newline at end of file