You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Tomek Rekawek <re...@adobe.com> on 2015/08/24 13:47:25 UTC

persistent set of strings

Hello,

I started working on OAK-3148, which is a new feature that allows to gradually migrate blobs from one store to another, without turning off the instance. In order to create the SplitBlobStore I need a way to remember (and save) already transferred blob ids.

So, basically I need a persistent and mutable set of strings. Do we have something like this in Oak already? I thought about a few custom solutions:

1. Saving blob ids in a file (at the beginning it can be a flat text file, then some b-tree), with a memory cache and/or bloom filter.
  - but it adds complexity, requires the maintenance, etc.
2. Creating SegmentNodeStore, with bucketing via the hashcode
  - but running the second segment node store just to persist a bunch of ids seems a little excessive.
3. Custom cache solution, like ehcache
  - but adding a new, big library just to support this feature doesn’t seem right as we have to deal with dependency versions, embedding, etc.

So, maybe there is some lightweight and reliable “4” in the Oak already?

Thanks,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com

Re: persistent set of strings

Posted by Chetan Mehrotra <ch...@gmail.com>.
Hi Tomek,

To start with I think a flat file based approach should be fine. While
working on [1] it was observed that 2M blobId consumed 500MB memory.
As this logic is to be implemented in oak-run probably it should be
fine for now to just use a in memory HashSet

Later if it becomes problem we can think of some offheap solution. You
can also look into using MVStore which is being used in
DocumentNodeStore for persistent cache.

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-2882?focusedCommentId=14550198&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14550198


On Mon, Aug 24, 2015 at 5:17 PM, Tomek Rekawek <re...@adobe.com> wrote:
> Hello,
>
> I started working on OAK-3148, which is a new feature that allows to gradually migrate blobs from one store to another, without turning off the instance. In order to create the SplitBlobStore I need a way to remember (and save) already transferred blob ids.
>
> So, basically I need a persistent and mutable set of strings. Do we have something like this in Oak already? I thought about a few custom solutions:
>
> 1. Saving blob ids in a file (at the beginning it can be a flat text file, then some b-tree), with a memory cache and/or bloom filter.
>   - but it adds complexity, requires the maintenance, etc.
> 2. Creating SegmentNodeStore, with bucketing via the hashcode
>   - but running the second segment node store just to persist a bunch of ids seems a little excessive.
> 3. Custom cache solution, like ehcache
>   - but adding a new, big library just to support this feature doesn’t seem right as we have to deal with dependency versions, embedding, etc.
>
> So, maybe there is some lightweight and reliable “4” in the Oak already?
>
> Thanks,
> Tomek
>
> --
> Tomek Rękawek | Adobe Research | www.adobe.com
> rekawek@adobe.com