You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2015/05/21 22:30:18 UTC
[jira] [Commented] (OAK-2896) Putting many elements into a map results in many small segments.

    [ https://issues.apache.org/jira/browse/OAK-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554989#comment-14554989 ] 

Michael Dürig commented on OAK-2896:
------------------------------------

!https://issues.apache.org/jira/secure/attachment/12734633/OAK-2896.png|width=700!

Above graph shows the effect. Subsequent insertion of 1000 keys into the maps scales linearly up to roughly 300k keys where the graph has a kink. This is the point where segment saturation due to maxing out the number of references kicks in. From here on segments will be flushed prematurely (most at only 7kb). This results in large segment graphs. Thich in turn leads to a lot of memory churn and gc activity as can be seen by the frequent spikes appearing more an more as the map grows. 

> Putting many elements into a map results in many small segments. 
> -----------------------------------------------------------------
>
>                 Key: OAK-2896
>                 URL: https://issues.apache.org/jira/browse/OAK-2896
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segmentmk
>            Reporter: Michael Dürig
>              Labels: performance
>             Fix For: 1.3.0
>
>         Attachments: OAK-2896.png
>
>
> There is an issue with how the HAMT implementation ({{SegmentWriter.writeMap()}} interacts with the 256 segment references limit when putting many entries into the map: This limit gets regularly reached once the maps contains about 200k entries. At that points segments get prematurely flushed resulting in more segments, thus more references and thus even smaller segments. It is common for segments to be as small as 7k with a tar file containing up to 35k segments. This is problematic as at this point handling of the segment graph becomes expensive, both memory and CPU wise. I have seen persisted segment graphs as big as 35M where the usual size is a couple of ks. 
> As the HAMT map is used for storing children of a node this might have an advert effect on nodes with many child nodes. 
> The following code can be used to reproduce the issue: 
> {code}
> SegmentWriter writer = new SegmentWriter(segmentStore, getTracker(), V_11);
> MapRecord baseMap = null;
> for (;;) {
>     Map<String, RecordId> map = newHashMap();
>     for (int k = 0; k < 1000; k++) {
>         RecordId stringId = writer.writeString(String.valueOf(rnd.nextLong()));
>         map.put(String.valueOf(rnd.nextLong()), stringId);
>     }
>     Stopwatch w = Stopwatch.createStarted();
>     baseMap = writer.writeMap(baseMap, map);
>     System.out.println(baseMap.size() + " " + w.elapsed());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)