You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Francesco Mari (JIRA)" <ji...@apache.org> on 2016/10/11 12:25:21 UTC

[jira] [Updated] (OAK-4923) Improve segment deserialization performance

     [ https://issues.apache.org/jira/browse/OAK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Mari updated OAK-4923:
--------------------------------
    Attachment: OAK-4923-02.patch
                OAK-4923-01.patch

In the first version of the attached patch, I added two LRU caches shared by every segment. Previously loaded data is cached by segment ID. The second version of the patch caches the read data per segment, exploiting the fact that only used segments are supposed to be retained in memory. In my local benchmarks, the first version performs better than the second.

These patches are only proof of concepts, they are not finished by any means. [~mduerig], what do you think?

> Improve segment deserialization performance
> -------------------------------------------
>
>                 Key: OAK-4923
>                 URL: https://issues.apache.org/jira/browse/OAK-4923
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>         Attachments: OAK-4923-01.patch, OAK-4923-02.patch
>
>
> The methods {{readReferencedSegments}} and {{readRecordNumberOffsets}} in {{Segment}} compute the returned data every time they are called. While this is a very clean implementation, this might force unexpected I/O operations, since the "buffer" the data is read from usually is a memory-mapped file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)