You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2017/03/08 10:14:38 UTC
[jira] [Updated] (OAK-5910) Reduce copying of data when reading
mmapped records
[ https://issues.apache.org/jira/browse/OAK-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Parvulescu updated OAK-5910:
---------------------------------
Attachment: OAK-5910.patch
initial patch. javadocs missing on Segment class, but it proves the idea.
fyi [~mduerig], [~frm].
benchmarks are very flaky (more a reflection on the state of benchmarks rather than the patch itself):
ConcurrentReadWriteTest on Trunk:
# ConcurrentReadWriteTest C min 10% 50% 90% max N
Oak-Segment-Tar 1 55 98 119 145 307 494
# ConcurrentReadWriteTest C min 10% 50% 90% max N
Oak-Segment-Tar 1 75 98 118 137 246 504
# ConcurrentReadWriteTest C min 10% 50% 90% max N
Oak-Segment-Tar 1 45 93 112 132 221 532
ConcurrentReadWriteTest with patch:
# ConcurrentReadWriteTest C min 10% 50% 90% max N
Oak-Segment-Tar 1 40 97 116 135 252 517
# ConcurrentReadWriteTest C min 10% 50% 90% max N
Oak-Segment-Tar 1 44 100 121 142 242 493
# ConcurrentReadWriteTest C min 10% 50% 90% max N
Oak-Segment-Tar 1 71 92 112 128 256 537
it seems that the patch version looks better over multiple test runs but even the unpatched version spikes on my machine, so the results are borderline useless, unless someone can propose a reliable way to run benchmarks without the spikiness.
> Reduce copying of data when reading mmapped records
> ---------------------------------------------------
>
> Key: OAK-5910
> URL: https://issues.apache.org/jira/browse/OAK-5910
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar
> Reporter: Alex Parvulescu
> Assignee: Alex Parvulescu
> Fix For: 1.8
>
> Attachments: OAK-5910.patch
>
>
> The idea is to reduce the amount of extra byte buffers created when reading mmapped records, if possible pushing the ByteBuffer all the way to the consumer.
> For example reading a String from a Segment right now means first reading the bytes of of the record into a byte array, then creating a string with an encoding (which behind the scenes will copy the byte array again and run it through the decoder). An alternative is to call {{decode}} on the Charset and pass in the ByteBuffer, skipping the intermediate operations.
> There are a few cases of this I included in the patch, but there may be others (like the {{SegmentStream}} which needs a full rewrite).
> Interested in what others think of this!
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)