You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Davide Giannella (JIRA)" <ji...@apache.org> on 2017/07/06 15:35:06 UTC

[jira] [Closed] (OAK-6339) MapRecord#getKeys should should initialize child iterables lazily

     [ https://issues.apache.org/jira/browse/OAK-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Davide Giannella closed OAK-6339.
---------------------------------

Bulk close for 1.7.3

> MapRecord#getKeys should should initialize child iterables lazily
> -----------------------------------------------------------------
>
>                 Key: OAK-6339
>                 URL: https://issues.apache.org/jira/browse/OAK-6339
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Chetan Mehrotra
>            Assignee: Michael Dürig
>            Priority: Minor
>              Labels: candidate_oak_1_6
>             Fix For: 1.8, 1.7.3
>
>         Attachments: OAK-6339-1.6.patch
>
>
> Recently we saw OutOfMemory using [oakRepoStats|https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/repostats] script with a SegmentNodeStore setup where uuid index has 16M+ entries and thus creating a very flat hierarchy. This happened while computing Tree#getChildren iterator which internally invokes MapRecord#getKeys to obtain an iterable for child node names.
> This happened because code in getKeys computes the key list eagerly by calling bucket.getKeys() which recursivly calls same for each child bucket and thus resulting in eager evaluation.
> {code}
>         if (isBranch(size, level)) {
>             List<MapRecord> buckets = getBucketList(segment);
>             List<Iterable<String>> keys =
>                     newArrayListWithCapacity(buckets.size());
>             for (MapRecord bucket : buckets) {
>                 keys.add(bucket.getKeys());
>             }
>             return concat(keys);
>         }
> {code}
> Instead here we should use same approach as used in MapRecord#getEntries i.e. evalate the iterable for child buckets lazily
> {code}
>         if (isBranch(size, level)) {
>             List<MapRecord> buckets = getBucketList(segment);
>             List<Iterable<MapEntry>> entries =
>                     newArrayListWithCapacity(buckets.size());
>             for (final MapRecord bucket : buckets) {
>                 entries.add(new Iterable<MapEntry>() {
>                     @Override
>                     public Iterator<MapEntry> iterator() {
>                         return bucket.getEntries(diffKey, diffValue).iterator();
>                     }
>                 });
>             }
>             return concat(entries);
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)