You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2012/10/15 10:06:03 UTC
[jira] [Created] (OAK-377) Data store garbage collection
Thomas Mueller created OAK-377:
----------------------------------
Summary: Data store garbage collection
Key: OAK-377
URL: https://issues.apache.org/jira/browse/OAK-377
Project: Jackrabbit Oak
Issue Type: Improvement
Components: core, mk
Reporter: Thomas Mueller
Priority: Minor
Unused binaries in the data store need to be garbage collected.
There is a partial implementation in oak-mk, however it is currently not run (not run automatically, and I think there is no way to run it manually).
Also, we might want to investigate in faster garbage collection algorithms: young generation garbage collection, or garbage collection using reference counting (for example using an index of references to the data store).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OAK-377) Data store garbage collection
Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OAK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476055#comment-13476055 ]
Thomas Mueller commented on OAK-377:
------------------------------------
In some cases, we might want to share a data store (multiple repositories access the same data store), as this was possible with Jackrabbit 2.x. This will also affect garbage collection.
> Data store garbage collection
> -----------------------------
>
> Key: OAK-377
> URL: https://issues.apache.org/jira/browse/OAK-377
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core, mk
> Reporter: Thomas Mueller
> Priority: Minor
>
> Unused binaries in the data store need to be garbage collected.
> There is a partial implementation in oak-mk, however it is currently not run (not run automatically, and I think there is no way to run it manually).
> Also, we might want to investigate in faster garbage collection algorithms: young generation garbage collection, or garbage collection using reference counting (for example using an index of references to the data store).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OAK-377) Data store garbage collection
Posted by "Thomas Mueller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OAK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492267#comment-13492267 ]
Thomas Mueller commented on OAK-377:
------------------------------------
Two ways to speed up garbage collection:
* Keep an index on node references, so that it is not required to traverse the whole repository but only the nodes that reference binaries.
* Generational garbage collection: If the blob store can keep track of all blobs added since revision X, it needs to only go through the diff from that revision the latest ones to determine which of those blobs can be removed early. Since most extra binaries are short-lived (temporary files, etc.), we'd only need to do a full mark/sweep garbage collection fairly rarely for a typical repository, like once a week or month, or perhaps just on-demand when running out of extra space.
> Data store garbage collection
> -----------------------------
>
> Key: OAK-377
> URL: https://issues.apache.org/jira/browse/OAK-377
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core, mk
> Reporter: Thomas Mueller
> Priority: Minor
>
> Unused binaries in the data store need to be garbage collected.
> There is a partial implementation in oak-mk, however it is currently not run (not run automatically, and I think there is no way to run it manually).
> Also, we might want to investigate in faster garbage collection algorithms: young generation garbage collection, or garbage collection using reference counting (for example using an index of references to the data store).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira