You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2018/03/27 14:41:00 UTC

[jira] [Commented] (OAK-5655) TarMK: Analyse locality of reference

    [ https://issues.apache.org/jira/browse/OAK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415730#comment-16415730 ] 

Michael Dürig commented on OAK-5655:
------------------------------------

At [http://svn.apache.org/viewvc?rev=1827841&view=rev] I added some utility classes to collect IO traces for specific access patterns. Access patterns are specified via a {{Trace}} instance. Currently the only concrete implementation is {{BreathFirstTrace}}, which traverses the first {{n}} levels of a tree in a breath first manner.

IO traces are collected as CSV files:
{noformat}
timestamp,file,segmentId,length,elapsed,depth,count
1522154516424,data01415a.tar,f81378df-b3f8-4b25-0000-00000002c450,181328,573411,0,1
1522154516441,data01415a.tar,9c2117cb-6eaa-4cf9-0000-00000003ffd0,262096,680192,0,1
1522154516444,data01415a.tar,3fdca869-9272-4b04-0000-00000003ffe0,262112,668914,0,1
....
 {noformat}
Here depth and count are contributed by the {{BreathFirstTrace}} and record the current depth of the tree and the number of nodes traversed so far.

> TarMK: Analyse locality of reference 
> -------------------------------------
>
>                 Key: OAK-5655
>                 URL: https://issues.apache.org/jira/browse/OAK-5655
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Priority: Major
>              Labels: scalability
>             Fix For: 1.10
>
>         Attachments: compaction-time-vs-reposize.m, compaction-time-vs.reposize.png, data00053a.tar-reads.png, offrc.jfr, segment-per-path-compacted-nocache.png, segment-per-path-compacted-nostringcache.png, segment-per-path-compacted.png, segment-per-path.png
>
>
> We need to better understand the locality aspects of content stored in TarMK: 
> * How is related content spread over segments?
> * What content do we consider related? 
> * How does locality of related content develop over time when changes are applied?
> * What changes do we consider typical?
> * What is the impact of compaction on locality? 
> * What is the impact of the deduplication caches on locality (during normal operation and during compaction)?
> * How good are checkpoints deduplicated? Can we monitor this online?
> * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)