You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2014/06/29 15:55:24 UTC

[jira] [Updated] (OAK-1925) Use streamed io instead of RandomAccessFile in TarWriter

     [ https://issues.apache.org/jira/browse/OAK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chetan Mehrotra updated OAK-1925:
---------------------------------

    Attachment: OAK-1925.patch

Patch for using buffered stream io for writes. Couple of observations

* With this patch no significant difference is seen in WikipediaImport benchmark. Which is kind of expected as TarWrite is not writing very small size of byte data and hence perf would be similar
* Running an AEM 6 instance with this change show a minor improvement of 40 sec. From ~360 sec to ~320 sec
* As reads for wriiten tar entries would not interfere with writes it might give a minor improvement 
* However there is a downside that in memory TarEntry cache would eat up at max 256 MB of RAM

So no strong reason to change the existing impl as per numbers but my hunch is that streamed io might provide better throughput!! Recently Lucene removed all usage of RandomAccessFile with Stream (LUCENE-5678)

[~jukkaz] [~alex.parvulescu] Can you have a look?


> Use streamed io instead of RandomAccessFile in TarWriter
> --------------------------------------------------------
>
>                 Key: OAK-1925
>                 URL: https://issues.apache.org/jira/browse/OAK-1925
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segmentmk
>            Reporter: Chetan Mehrotra
>            Priority: Minor
>         Attachments: OAK-1925.patch
>
>
> TarWriter currently uses RandomAccessFile to 
> * Write the tar entries
> * Read written entries
> The write however are currently sequential. It might be better to use streamed buffered io for the write and maintain an in memory cache of written tar entries to serve the reads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)