You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Amit Jain (JIRA)" <ji...@apache.org> on 2016/06/06 08:17:59 UTC

[jira] [Commented] (OAK-4430) DataStoreBlobStore#getAllChunkIds fetches DataRecord when not needed

    [ https://issues.apache.org/jira/browse/OAK-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316317#comment-15316317 ] 

Amit Jain commented on OAK-4430:
--------------------------------

The method {{DataStoreBlobStore#getAllChunkIds}} also used the DataRecord fetched to encode the length in the id. Considering that this method has only one consumer i.e. the {{MarkSweepGarbageCollector}}, we could alter this method itself to not encode the blob ids with the length and clearly specify in the javadocs. Alternately, we could add an overloaded method that returns all raw blob ids.
Either way this would require a method which the gc class can use to get a raw id given a length encoded id which the "node store referenced blobs"  collection phase would return.

[~chetanm] wdyt?

> DataStoreBlobStore#getAllChunkIds fetches DataRecord when not needed
> --------------------------------------------------------------------
>
>                 Key: OAK-4430
>                 URL: https://issues.apache.org/jira/browse/OAK-4430
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob
>            Reporter: Amit Jain
>            Assignee: Amit Jain
>              Labels: candidate_oak_1_0, candidate_oak_1_2, candidate_oak_1_4
>             Fix For: 1.5.3
>
>
> DataStoreBlobStore#getAllChunkIds loads the DataRecord for checking that the lastModifiedTime criteria is satisfied against the given {{maxLastModifiedTime}}. 
> When the {{maxLastModifiedTime}} has a value 0 it  effectively means ignore any last modified time check (and which is the only usage currently from MarkSweepGarbageCollector). This should ignore fetching the DataRecords as this can be very expensive for e.g on calls to S3 with millions of blobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)