You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2016/06/09 09:50:21 UTC

[jira] [Commented] (OAK-3797) SegmentTracker#collectBlobReferences should retain fewer SegmentId instances

    [ https://issues.apache.org/jira/browse/OAK-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322264#comment-15322264 ] 

Alex Parvulescu commented on OAK-3797:
--------------------------------------

We've recently seen an issue with the queue mechanism where it would overflow the max capacity of the queue: {{Blob garbage collection failed: Sorry, deque too big}}.
I'm submitting a patch for review to severely reduce the working set of the queue by enforcing the de-duplication of recordids early, at {{queue#add}} time instead of at processing time. this makes an important difference as the queue size could explode by adding each non-processed's references that may already be there, just to skip them later as the have been processed in the meantime.
Bonus points, turned the {{Queue<SegmentId}} into a {{Queue<UUID>}}.

Patch is for {{segmentmk}} because that's where the ongoing problem is, but it should also be applied to {{segment-tar}}.

[~mduerig] [~amitjain] feedback appreciated!

> SegmentTracker#collectBlobReferences should retain fewer SegmentId instances
> ----------------------------------------------------------------------------
>
>                 Key: OAK-3797
>                 URL: https://issues.apache.org/jira/browse/OAK-3797
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar, segmentmk
>            Reporter: Michael Dürig
>            Assignee: Alex Parvulescu
>              Labels: datastore, gc
>             Fix For: 1.6
>
>         Attachments: OAK-3797-segmentmk.patch
>
>
> {{SegmentTracker#collectBlobReferences}} currently keeps a queue of yet unprocessed {{SegmentId}} instances internally. This potentially impacts the system as those instances are also tracked in the segment tracker's segment id tables. I think we should improve the implementation to not retain so many {{SegmentId}} instances and rely on arrays of {{msb}}, {{lsb}} instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)