You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by mr...@apache.org on 2017/03/23 08:24:50 UTC
svn commit: r1788190 -
/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
Author: mreutegg
Date: Thu Mar 23 08:24:50 2017
New Revision: 1788190
URL: http://svn.apache.org/viewvc?rev=1788190&view=rev
Log:
OAK-5918: Document enhancements in DocumentNodeStore in 1.6
Minor corrections
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md?rev=1788190&r1=1788189&r2=1788190&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/document/node-bundling.md Thu Mar 23 08:24:50 2017
@@ -33,28 +33,28 @@
Document NodeStore stores the JCR nodes as Document in underlying `DocumentStore`.
So depending on backend that Document is stored in following way
-* Mongo - 1 JCR node is mapped to 1 Mongo Document in `nodes` collection
+* MongoDB - 1 JCR node is mapped to 1 MongoDB Document in `nodes` collection
* RDB - 1 JCR node is mapped to 1 row in `nodes` table
The remaining part of the document will focus on the `MongoDocumentStore` to explain and illustrate
bundling concepts.
-For very fine grained content with many nodes and only few properties per node it would be more efficient to bundle
-multiple nodes into a single Mongo document.
-Such bundling would mostly benefit reading because there are less roundtrips to the backend.
-At the same time storage footprint would be lower because metadata overhead is per document.
-This is specially important for the various indexes like `_id` and `_modified_1__id_1` as they would have lesser
+For very fine grained content with many nodes and only few properties per node it is more efficient to bundle
+multiple nodes into a single MongoDB document.
+Such bundling mostly benefits reading because there are less round-trips to the backend.
+At the same time storage footprint is lower because metadata overhead is per document.
+This is specially important for the various indexes like `_id` and `_modified_1__id_1` as they have less
entries indexed.
## <a name="bundling-usage"></a> Usage
Bundling is enabled on per nodetype basis.
-Bundling definitions are defined as content in repository under `/jcr:system/rep:documentStore/bundlor`.
+Bundling definitions are defined as content in the repository under `/jcr:system/rep:documentStore/bundlor`.
+ <node type name>
- pattern - multi
-For example below content structure would enable bundling for nodes of type `nt:file` and `app:Asset`
+For example below content structure enables bundling for nodes of type `nt:file` and `app:Asset`
+ jcr:system
+ documentstore
@@ -64,7 +64,9 @@ For example below content structure woul
+ app:Asset
- pattern = ["jcr:content/metadata", "jcr:content/renditions", "jcr:content"]
-Once this is done any node of type `nt:file` created _after_ this would be stored in bundled format.
+Once this is done any node of type `nt:file` created _after_ this will be stored
+in bundled format. Nodes created _before_ the configuration was added are not
+affected and their underlying documents are not rewritten.
* _Bundling Roots_ - Nodes having type for which bundling patterns are defined
* _Bundling Pattern_ - Pattern defined under bundling config path which governs which all relative nodes are bundled
@@ -74,24 +76,24 @@ Once this is done any node of type `nt:f
Key points to note here
-1. Bundling patterns can be defined for either mixin or primaryType
-2. Bundling pattern defined for mixins take precedence over those defined for primary type
-3. Bundling only impacts content created after the bundling pattern is set
-4. Existing content would not be modified
+1. Bundling patterns can be defined for either jcr:mixinTypes or jcr:primaryType.
+2. Bundling pattern defined for mixins take precedence over those defined for primary node type.
+3. Bundling only impacts content created after the bundling pattern is set.
+4. Existing content is not modified.
5. This feature can be enabled or disabled anytime.
-6. If bundling is disable later then it would only prevent bundling of nodes created after disabling. Existing
- bundled nodes would remain bundled
-7. Bundling pattern is baked in into the created node. So if bundling pattern is changed later it would only affect
- new bundled roots created after the change
+6. If bundling is disable later then it only prevents bundling of nodes created after disabling. Existing
+ bundled nodes remain bundled.
+7. Bundling pattern is baked in into the created node. So if bundling pattern is changed later, it only affects
+ new bundled roots created after the change.
8. Writes to `/jcr:system/rep:documentStore/bundlor` should be restricted to system admin as this is an important
configuration and any mis configuration here can have severe adverse impact on repository.
-9. While selecting bundling rule for any node node type inheritance is not considered. Bundling pattern is selected
- based on exact match of primary type or mixin type
+9. While selecting bundling rule for any node node type inheritance is _not_ considered. Bundling pattern is selected
+ based on exact match of jcr:mixinTypes or jcr:primaryType names.
### <a name="bundling-pattern"></a> Bundling Pattern
Bundling pattern is a multi value value property. The pattern elements are list of relative node paths which should be
-bundled as part of _bundling root_. The relative node paths can be of following type
+bundled as part of _bundling root_. The relative node paths can be of following type:
* Static - Like 'jcr:content', 'jcr:content/metadata'.
* Wildcard _(Experimental Feature)_ - Like 'jcr:content/renditions/**'. This would bundle all nodes under relative
@@ -117,7 +119,7 @@ Lets take an example of `nt:file` node l
![Bundling Nodes](node-bundling-file.png)
-This JCR node structure would be stored in Mongo in 2 documents
+This JCR node structure would be stored in MongoDB in 2 documents
{
"_id" : "2:/content/book.jpg",
@@ -153,7 +155,7 @@ Now with bundling pattern like
+ nt:file (oak:Unstructured)
- pattern = ["jcr:content"]
-Would bundle the 2 nodes in nt:file node structure in same Mongo Document
+Would bundle the 2 nodes in nt:file node structure in same MongoDB Document
{
"_id" : "2:/content/book.jpg",
@@ -176,7 +178,7 @@ Would bundle the 2 nodes in nt:file node
"_modCount" : NumberLong(1)
}
-So with bundling 1 nt:file would create 1 Mongo Document. 10M nt:file instance would create 10M Mongo documents
+So with bundling 1 nt:file would create 1 MongoDB Document. 10M nt:file instance would create 10M MongoDB documents
instead of 20M (without bundling)
### <a name="bundling-usage-file"></a> Bundling app:Asset
@@ -207,7 +209,7 @@ Above structure has following characteri
can have unbounded content
* Static and bounded structure take upto ~15 JCR Nodes (assuming 5 types of renditions)
-So 1 asset ~ 15 JCR Nodes and ~ 15 Mongo documents. Thus by default 10M assets would lead to 150M+ Mongo Documents.
+So 1 asset ~ 15 JCR Nodes and ~ 15 MongoDB documents. Thus by default 10M assets would lead to 150M+ MongoDB Documents.
Such a structure can make use of Node Bundling to reduce this storage ratio.
Lets define a bundling pattern like below
@@ -220,7 +222,7 @@ Lets define a bundling pattern like belo
+ app:Asset
- pattern = ["jcr:content/metadata", "jcr:content/renditions/**", "jcr:content"]
-With this bundling pattern same app:Asset structure would be stored in 1 Mongo Document excluding 'comments' and 'xmp'
+With this bundling pattern same app:Asset structure would be stored in 1 MongoDB Document excluding 'comments' and 'xmp'
nodes
{
@@ -261,12 +263,12 @@ nodes
## <a name="bundling-design-considerations"></a> Design Considerations
-While enabling bundling consider following points
+While enabling bundling consider following points:
**Enable bundling only for static and bounded relative node paths**
-As bundled nodes are stored in single Mongo Document care must be taken such that Bundled Document size is within
-reasonable limits otherwise Mongo (or RDB) would reject such heavy documents. So bundling pattern should only include
+As bundled nodes are stored in single MongoDB Document care must be taken such that bundled Document size is within
+reasonable limits otherwise MongoDB (or RDB) would reject such heavy documents. So bundling pattern should only include
those relative node paths which are static or bounded.
For example in app:Asset it would be wrong to bundle nodes under 'jcr:content/comments' as comments can be unlimited and
@@ -279,7 +281,7 @@ So take into account the content structu
If the content structure is mostly made up of nodes of type `nt:unstrcutured` or `oak:Unstructured` try to identify
subtree which have consistent structure and define a marker mixin to mark such subtrees. Then bundling pattern can be
-defined against such mixins
+defined against such mixins.
For more details on how bundling is implemented refer to [OAK-1312][OAK-1312]
@@ -292,14 +294,14 @@ lots of queries for child nodes as JCR l
'jcr:content/renditions. With bundling all those queries are avoided.
* **Reduced number of Documents in persistent store** - Currently for a nodetype like app:Asset where 1 app:Asset = 15 JCR Nodes.
-If we have 10M assets then we would be consuming 150 M documents in Mongo.
-With bundling this ratio can be reduced to say 1-5 then it would reduce actual number of documents in Mongo. Lesser
-number of documents means lesser size for _id and {_modified, _id} index. Lesser index size would allow storing lot more
-Mongo documents as index size is key factor for sizing Mongo setups
+If we have 10M assets then we would be consuming 150 M documents in MongoDB.
+With bundling this ratio can be reduced to say 1-5 then it would reduce actual number of documents in Mongo.
+Fewer documents means reduces size for _id and {_modified, _id} index. Reduced index size allows storing a lot more
+MongoDB documents as index size is key factor for sizing MongoDB setups.
### <a name="bundling-limits"></a> Limitations
Currently bundling logic has no fallback in case bundle document size exceeds the size imposed by persistent store.
-So try to ensure that bundling is limited and does not bundle lots of nodes
+So try to ensure that bundling is limited and does not bundle lots of nodes.
[OAK-1312]: https://issues.apache.org/jira/browse/OAK-1312
\ No newline at end of file