You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2017/06/14 11:52:00 UTC

[jira] [Updated] (OAK-6339) MapRecord#getKeys should should initialize child iterables lazily

     [ https://issues.apache.org/jira/browse/OAK-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Dürig updated OAK-6339:
-------------------------------
    Attachment: OAK-6339-1.6.patch

Attached patch [^OAK-6339-1.6.patch] (for the 1.6 branch) fixes the problem and I was able to successfully run that script. 

> MapRecord#getKeys should should initialize child iterables lazily
> -----------------------------------------------------------------
>
>                 Key: OAK-6339
>                 URL: https://issues.apache.org/jira/browse/OAK-6339
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Chetan Mehrotra
>            Assignee: Michael Dürig
>            Priority: Minor
>             Fix For: 1.8
>
>         Attachments: OAK-6339-1.6.patch
>
>
> Recently we saw OutOfMemory using [oakRepoStats|https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/repostats] script with a SegmentNodeStore setup where uuid index has 16M+ entries and thus creating a very flat hierarchy. This happened while computing Tree#getChildren iterator which internally invokes MapRecord#getKeys to obtain an iterable for child node names.
> This happened because code in getKeys computes the key list eagerly by calling bucket.getKeys() which recursivly calls same for each child bucket and thus resulting in eager evaluation.
> {code}
>         if (isBranch(size, level)) {
>             List<MapRecord> buckets = getBucketList(segment);
>             List<Iterable<String>> keys =
>                     newArrayListWithCapacity(buckets.size());
>             for (MapRecord bucket : buckets) {
>                 keys.add(bucket.getKeys());
>             }
>             return concat(keys);
>         }
> {code}
> Instead here we should use same approach as used in MapRecord#getEntries i.e. evalate the iterable for child buckets lazily
> {code}
>         if (isBranch(size, level)) {
>             List<MapRecord> buckets = getBucketList(segment);
>             List<Iterable<MapEntry>> entries =
>                     newArrayListWithCapacity(buckets.size());
>             for (final MapRecord bucket : buckets) {
>                 entries.add(new Iterable<MapEntry>() {
>                     @Override
>                     public Iterator<MapEntry> iterator() {
>                         return bucket.getEntries(diffKey, diffValue).iterator();
>                     }
>                 });
>             }
>             return concat(entries);
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)