You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2012/12/07 11:59:21 UTC

[jira] [Comment Edited] (OAK-333) 1000 character path limit in MongoMK

    [ https://issues.apache.org/jira/browse/OAK-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526313#comment-13526313 ] 

Thomas Mueller edited comment on OAK-333 at 12/7/12 10:58 AM:
--------------------------------------------------------------

It would still have the same basic performance problems as indexing the hash, once the path gets too long. 

In (B) above, I described a solution that is somewhat similar, but avoids the problem: instead shrinking the end of the path, I suggested to shrink the beginning of the path. So if the path exceeds a limit, the first limit/2 characters are replaced with index, which could be the hash code actually. So

{code}
/a/very/long/path/that/exceeds/a/length/limit
{code}

would be converted to

{code}
<id(/a/very/long/path/that)>/exceeds/a/length/limit
{code}

Instead of a simple hash, I would use a lookup table. This lookup table would normally be empty (as normally there are no long paths). If a path is too long, then the left 50% of the path is stored there. So that each path that starts with /a/very/long/path/that uses the same shorter prefix).

Similar to the name index we use in Jackrabbit, the id of the long prefix could be the hash code of the prefix.

That way, similar paths stay on the same mongo shard.
                
      was (Author: tmueller):
    It would still have the same basic performance problems as indexing the hash, once the path gets too long. 

In (B) above, I described a solution that is somewhat similar, but avoids the problem: instead shrinking the end of the path, I suggested to shrink the beginning of the path. So if the path exceeds a limit, the first limit/2 characters are replaced with index, which could be the hash code actually. So

{code}
/a/very/long/path/that/exceeds/a/length/limit
{code}

would be converted to

{code}
<id(/a/very/long/path/that)>/exceeds/a/length/limit
{code}

Instead of a simple hash, I would use a lookup table. This lookup table would normally be empty (as normally there are no long paths). If a path is too long, then the left 50% of the path is stored there. So that each path that starts with /a/very/long/path/that.

Similar to the name index we use in Jackrabbit, the id of the long prefix could be the hash code of the prefix.

That way, similar paths stay on the same mongo shard.
                  
> 1000 character path limit in MongoMK
> ------------------------------------
>
>                 Key: OAK-333
>                 URL: https://issues.apache.org/jira/browse/OAK-333
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mk, mongomk
>    Affects Versions: 0.5
>            Reporter: Mete Atamel
>            Assignee: Mete Atamel
>            Priority: Minor
>         Attachments: OAK-333.patch
>
>
> In an infinite loop try to add nodes one under another to have N0/N1/N2...NN. At some point, the current parent node will not be found and the current commit will fail. I think this happens when the path length exceeds 1000 characters. Is this enough for a path? I was able to create this way only 222 levels in the tree (and my node names were really short N1, N2 ...)
> There's an automated tests for this: NodeExistsCommandMongoTest.testTreeDepth

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira