You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2017/06/14 11:52:00 UTC
[jira] [Updated] (OAK-6339) MapRecord#getKeys should should
initialize child iterables lazily
[ https://issues.apache.org/jira/browse/OAK-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Dürig updated OAK-6339:
-------------------------------
Attachment: OAK-6339-1.6.patch
Attached patch [^OAK-6339-1.6.patch] (for the 1.6 branch) fixes the problem and I was able to successfully run that script.
> MapRecord#getKeys should should initialize child iterables lazily
> -----------------------------------------------------------------
>
> Key: OAK-6339
> URL: https://issues.apache.org/jira/browse/OAK-6339
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Reporter: Chetan Mehrotra
> Assignee: Michael Dürig
> Priority: Minor
> Fix For: 1.8
>
> Attachments: OAK-6339-1.6.patch
>
>
> Recently we saw OutOfMemory using [oakRepoStats|https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/repostats] script with a SegmentNodeStore setup where uuid index has 16M+ entries and thus creating a very flat hierarchy. This happened while computing Tree#getChildren iterator which internally invokes MapRecord#getKeys to obtain an iterable for child node names.
> This happened because code in getKeys computes the key list eagerly by calling bucket.getKeys() which recursivly calls same for each child bucket and thus resulting in eager evaluation.
> {code}
> if (isBranch(size, level)) {
> List<MapRecord> buckets = getBucketList(segment);
> List<Iterable<String>> keys =
> newArrayListWithCapacity(buckets.size());
> for (MapRecord bucket : buckets) {
> keys.add(bucket.getKeys());
> }
> return concat(keys);
> }
> {code}
> Instead here we should use same approach as used in MapRecord#getEntries i.e. evalate the iterable for child buckets lazily
> {code}
> if (isBranch(size, level)) {
> List<MapRecord> buckets = getBucketList(segment);
> List<Iterable<MapEntry>> entries =
> newArrayListWithCapacity(buckets.size());
> for (final MapRecord bucket : buckets) {
> entries.add(new Iterable<MapEntry>() {
> @Override
> public Iterator<MapEntry> iterator() {
> return bucket.getEntries(diffKey, diffValue).iterator();
> }
> });
> }
> return concat(entries);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)