You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Marcel Reutegger <mr...@adobe.com> on 2013/10/24 09:35:28 UTC

[MongoMK] flag document with children

Hi,

yesterday Chetan, Thomas and I discussed an access pattern Chetan
saw with the MongoMK. Installing a vlt package shows many reads on
MongoDB for children of nodes that do not have child nodes. these
nodes are the leafs of the tree. due to the current content model, 
the MongoMK does not know if a node has children, by just looking
at the NodeDocument for that node. we then quickly discussed that
we could mark the leaf nodes once we discover that it is a leaf.

I was thinking about this a bit more and came to the conclusion that
it is probably better to mark the nodes with children, instead of the
leaf nodes. there are a number of advantages:

- it works better with concurrent writes. once set, it usually does not
change back unless the MongoMK GC actually removes a document
in MongoDB.
- less storage overhead. there are usually fewer parents than leaf nodes.
- it does not have to be 100% accurate. false positives are OK (document
says it has children, but then returns none).

WDYT?

Regards
 Marcel

Re: [MongoMK] flag document with children

Posted by Chetan Mehrotra <ch...@gmail.com>.
I have implemented the above logic as part of OAK-1117 [1]. With this
in place number of call made to Mongo on restarts of Adobe CQ goes
down from 42000 to 25000 significantly reducing the startup time when
Mongo is remote!!

regards
Chetan
[1] https://issues.apache.org/jira/browse/OAK-1117
Chetan Mehrotra


On Thu, Oct 24, 2013 at 3:23 PM, Chetan Mehrotra
<ch...@gmail.com> wrote:
> I am trying to prototype an approach. Would come up with a patch for
> this soon. So far I was going with the reverse approach whereby when I
> fetch a node I retrieve some extra child rows [1] in same call to
> determine if it has any children.
>
> But given that number of read would far exceed number of writes it
> would be better to perform extra update call. I would try to come up
> with a patch for this
>
> regards
> Chetan
> [1] by adding an or clause to fetch node with id say "^2:/foo/.*" to
> fetch child node for a parent with id "1:/foo".
> Chetan Mehrotra
>
>
> On Thu, Oct 24, 2013 at 3:08 PM, Thomas Mueller <mu...@adobe.com> wrote:
>> Hi,
>>
>> Yes, you are right. It should be relatively easy to implement (low risk).
>>
>> Regards,
>> Thomas
>>
>>
>> On 10/24/13 10:12 AM, "Marcel Reutegger" <mr...@adobe.com> wrote:
>>
>>>> The disadvantage is, when a node is added, either:
>>>>
>>>> - then the parent needs to be checked whether is already has this flag
>>>>set
>>>> (if it is in the cache), or
>>>
>>>I'd say a parent node is likely in the cache because oak will read it
>>>first before
>>>it is able to add a child.
>>>
>>>> - the parent needs to be updated to set the flag
>>>
>>>that's correct. though you only have to do it when it isn't set already.
>>>and
>>>the check should be cheap in most cases, because the node is in the cache.
>>>
>>>regards
>>> marcel
>>>
>>

Re: [MongoMK] flag document with children

Posted by Chetan Mehrotra <ch...@gmail.com>.
I am trying to prototype an approach. Would come up with a patch for
this soon. So far I was going with the reverse approach whereby when I
fetch a node I retrieve some extra child rows [1] in same call to
determine if it has any children.

But given that number of read would far exceed number of writes it
would be better to perform extra update call. I would try to come up
with a patch for this

regards
Chetan
[1] by adding an or clause to fetch node with id say "^2:/foo/.*" to
fetch child node for a parent with id "1:/foo".
Chetan Mehrotra


On Thu, Oct 24, 2013 at 3:08 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
> Yes, you are right. It should be relatively easy to implement (low risk).
>
> Regards,
> Thomas
>
>
> On 10/24/13 10:12 AM, "Marcel Reutegger" <mr...@adobe.com> wrote:
>
>>> The disadvantage is, when a node is added, either:
>>>
>>> - then the parent needs to be checked whether is already has this flag
>>>set
>>> (if it is in the cache), or
>>
>>I'd say a parent node is likely in the cache because oak will read it
>>first before
>>it is able to add a child.
>>
>>> - the parent needs to be updated to set the flag
>>
>>that's correct. though you only have to do it when it isn't set already.
>>and
>>the check should be cheap in most cases, because the node is in the cache.
>>
>>regards
>> marcel
>>
>

Re: [MongoMK] flag document with children

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

Yes, you are right. It should be relatively easy to implement (low risk).

Regards,
Thomas


On 10/24/13 10:12 AM, "Marcel Reutegger" <mr...@adobe.com> wrote:

>> The disadvantage is, when a node is added, either:
>> 
>> - then the parent needs to be checked whether is already has this flag
>>set
>> (if it is in the cache), or
>
>I'd say a parent node is likely in the cache because oak will read it
>first before
>it is able to add a child.
>
>> - the parent needs to be updated to set the flag
>
>that's correct. though you only have to do it when it isn't set already.
>and
>the check should be cheap in most cases, because the node is in the cache.
>
>regards
> marcel
>


RE: [MongoMK] flag document with children

Posted by Marcel Reutegger <mr...@adobe.com>.
> The disadvantage is, when a node is added, either:
> 
> - then the parent needs to be checked whether is already has this flag set
> (if it is in the cache), or

I'd say a parent node is likely in the cache because oak will read it first before
it is able to add a child.

> - the parent needs to be updated to set the flag

that's correct. though you only have to do it when it isn't set already. and
the check should be cheap in most cases, because the node is in the cache.

regards
 marcel


Re: [MongoMK] flag document with children

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

That sounds good to me.

The disadvantage is, when a node is added, either:

- then the parent needs to be checked whether is already has this flag set
(if it is in the cache), or
- the parent needs to be updated to set the flag

I wouldn't worry too much about resetting the flag, except when deleting
the node with the flag itself.

Regards,
Thomas



On 10/24/13 9:35 AM, "Marcel Reutegger" <mr...@adobe.com> wrote:

>Hi,
>
>yesterday Chetan, Thomas and I discussed an access pattern Chetan
>saw with the MongoMK. Installing a vlt package shows many reads on
>MongoDB for children of nodes that do not have child nodes. these
>nodes are the leafs of the tree. due to the current content model,
>the MongoMK does not know if a node has children, by just looking
>at the NodeDocument for that node. we then quickly discussed that
>we could mark the leaf nodes once we discover that it is a leaf.
>
>I was thinking about this a bit more and came to the conclusion that
>it is probably better to mark the nodes with children, instead of the
>leaf nodes. there are a number of advantages:
>
>- it works better with concurrent writes. once set, it usually does not
>change back unless the MongoMK GC actually removes a document
>in MongoDB.
>- less storage overhead. there are usually fewer parents than leaf nodes.
>- it does not have to be 100% accurate. false positives are OK (document
>says it has children, but then returns none).
>
>WDYT?
>
>Regards
> Marcel