You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2016/06/15 05:31:09 UTC

[jira] [Comment Edited] (OAK-1312) Bundle nodes into a document

    [ https://issues.apache.org/jira/browse/OAK-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329334#comment-15329334 ] 

Chetan Mehrotra edited comment on OAK-1312 at 6/15/16 5:30 AM:
---------------------------------------------------------------

Had a discussion on this with [~mreutegg] today. Some food for thought

* Make use of NodeType and mixin (as suggested above) to determine which nodes can be collapsed
* Make use of approach we took in oak-lucene where support for relative property was implemented by adding relative property path as field name in parent Lucene Document. So if for app:Asset we need to index jcr:content/metadata/format then for the Lucene document created for app:Asset node we create a field with name jcr:content/metadata/format. 

So lets take an example of nt:file
{noformat}
/content/image1/original (nt:file)
        + jcr:content
          - jcr:data = ...
{noformat}


# *Write flow* - UpdateOp would create property name based on relative path of the property being collapsed. For above case that would be jcr:content/jcr:data, jcr:content/jcr:primaryType and jcr:primaryType. So if a new node is detected which by its mixin is eligible for bundled storage then diff would read all child node and bundle the properties in the host node. This would result in a single NodeDocument with id /content/image1/original with given relative properties
# *Read Flow* - In all cases read would be via traversal i.e. read at arbitrary path should not happen. So when original DocumentNodeState is read and code ask for specific child node or all children then it can create further NodeState based on embeded relative properties
# *Update Flow* - Lets say if jcr:data is updated then as commit diff traverses and attempts to create UpdateOp it would have to detect if the property is part of bundle node tree or not
# *Observation Flow* - TDB

*Bundling Approach* 

To start with we can implement approach where for any marked node (marked for bundle ) complete subtree under that is to be bundled. This would make implementation simpler

Later we can also try an approach similar to one taken for index time aggregation. Application can provide the node paths which needs to be bundled via config and then while creating we make use of that to determine the host node in which all such stuff needs to be bundled.


was (Author: chetanm):
Had a discussion on this with [~mreutegg] today. Some food for thought

* Make use of NodeType and mixin (as suggested above) to determine which nodes can be collapsed
* Make use of approach we took in oak-lucene where support for relative property was implemented by adding relative property path as field name in parent Lucene Document. So if for app:Asset we need to index jcr:content/metadata/format then for the Lucene document created for app:Asset node we create a field with name jcr:content/metadata/format. 

So lets take an example of nt:file
{noformat}
/content/image1/original (nt:file)
        + jcr:content
          - jcr:data = ...
{noformat}


# *Write flow* - UpdateOp would create property name based on relative path of the property being collapsed. For above case that would be jcr:content/jcr:data, jcr:content/jcr:primaryType and jcr:primaryType. So if a new node is detected which by its mixin is eligible for bundled storage then diff would read all child node and bundle the properties in the host node. This would result in a single NodeDocument with id /content/image1/original with given relative properties
# *Read Flow* - In all cases read would be via traversal i.e. read at arbitrary path should not happen. So when original DocumentNodeState is read and code ask for specific child node or all children then it can create further NodeState based on embeded relative properties
# *Update Flow* - Lets say if jcr:data is updated then as commit diff traverses and attempts to create UpdateOp it would have to detect if the property is part of bundle node tree or not

*Bundling Approach* 

To start with we can implement approach where for any marked node (marked for bundle ) complete subtree under that is to be bundled. This would make implementation simpler

Later we can also try an approach similar to one taken for index time aggregation. Application can provide the node paths which needs to be bundled via config and then while creating we make use of that to determine the host node in which all such stuff needs to be bundled.

> Bundle nodes into a document
> ----------------------------
>
>                 Key: OAK-1312
>                 URL: https://issues.apache.org/jira/browse/OAK-1312
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, documentmk
>            Reporter: Marcel Reutegger
>            Assignee: Chetan Mehrotra
>              Labels: performance
>             Fix For: 1.6
>
>
> For very fine grained content with many nodes and only few properties per node it would be more efficient to bundle multiple nodes into a single MongoDB document. Mostly reading would benefit because there are less roundtrips to the backend. At the same time storage footprint would be lower because metadata overhead is per document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)