You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2012/10/15 10:06:03 UTC

[jira] [Created] (OAK-377) Data store garbage collection

Thomas Mueller created OAK-377:
----------------------------------

             Summary: Data store garbage collection
                 Key: OAK-377
                 URL: https://issues.apache.org/jira/browse/OAK-377
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: core, mk
            Reporter: Thomas Mueller
            Priority: Minor


Unused binaries in the data store need to be garbage collected.

There is a partial implementation in oak-mk, however it is currently not run (not run automatically, and I think there is no way to run it manually).

Also, we might want to investigate in faster garbage collection algorithms: young generation garbage collection, or garbage collection using reference counting (for example using an index of references to the data store).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OAK-377) Data store garbage collection

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OAK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476055#comment-13476055 ] 

Thomas Mueller commented on OAK-377:
------------------------------------

In some cases, we might want to share a data store (multiple repositories access the same data store), as this was possible with Jackrabbit 2.x. This will also affect garbage collection.
                
> Data store garbage collection
> -----------------------------
>
>                 Key: OAK-377
>                 URL: https://issues.apache.org/jira/browse/OAK-377
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mk
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> Unused binaries in the data store need to be garbage collected.
> There is a partial implementation in oak-mk, however it is currently not run (not run automatically, and I think there is no way to run it manually).
> Also, we might want to investigate in faster garbage collection algorithms: young generation garbage collection, or garbage collection using reference counting (for example using an index of references to the data store).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OAK-377) Data store garbage collection

Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OAK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492267#comment-13492267 ] 

Thomas Mueller commented on OAK-377:
------------------------------------

Two ways to speed up garbage collection:

* Keep an index on node references, so that it is not required to traverse the whole repository but only the nodes that reference binaries.

* Generational garbage collection: If the blob store can keep track of all blobs added since revision X, it needs to only go through the diff from that revision the latest ones to determine which of those blobs can be removed early. Since most extra binaries are short-lived (temporary files, etc.), we'd only need to do a full mark/sweep garbage collection fairly rarely for a typical repository, like once a week or month, or perhaps just on-demand when running out of extra space.

                
> Data store garbage collection
> -----------------------------
>
>                 Key: OAK-377
>                 URL: https://issues.apache.org/jira/browse/OAK-377
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mk
>            Reporter: Thomas Mueller
>            Priority: Minor
>
> Unused binaries in the data store need to be garbage collected.
> There is a partial implementation in oak-mk, however it is currently not run (not run automatically, and I think there is no way to run it manually).
> Also, we might want to investigate in faster garbage collection algorithms: young generation garbage collection, or garbage collection using reference counting (for example using an index of references to the data store).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira