You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2014/04/02 15:00:42 UTC

Re: svn commit: r1583994 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/OakFileDataStore.java test/java/org/apache/jackrabbit/oak/plugins/blob/datastore/OakFileDataStoreTest.java

Hi,

On Wed, Apr 2, 2014 at 8:25 AM,  <ch...@apache.org> wrote:
> +        //TODO FIXME Temporary workaround for OAK-1666. Override the default
> +        //synchronized map with a Noop. This should be removed when fix
> +        //for JCR-3764 is part of release.
> +        inUse = new NoOpMap<DataIdentifier, WeakReference<DataIdentifier>>();

This breaks the following client:

    Binary binary = session.getValueFactory().createBinary(...);
    // wait over a garbage collection cycle
    session.getRootNode().setProperty("foo", binary);
    session.save();

Note that the wait in between could be anything, in the worst case
just bad timing or more likely some other long-running statements like
waiting for user input or creating other large binaries.

The inUse map is in FileDataStore for a reason. If it's causing
performance issues, the right solution is *not* to just disable it but
rather to figure out how the same functionality could be achieved more
efficiently.

BR,

Jukka Zitting

Re: svn commit: r1583994 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/OakFileDataStore.java test/java/org/apache/jackrabbit/oak/plugins/blob/datastore/OakFileDataStoreTest.java

Posted by Chetan Mehrotra <ch...@gmail.com>.
On Wed, Apr 2, 2014 at 6:30 PM, Jukka Zitting <ju...@gmail.com> wrote:
> The inUse map is in FileDataStore for a reason.

Ack. From what I have understood from Blob GC logic in Oak is that it
relies on blob last modified value to distinguish between active used
blobs. So for performing GC only those blob would be considered whose
lastModified value is say 1 day. Only these blobs would be candidate
for deletion. This ensures that any blob created in transient space
are not considered for GC.

So current logic does make an assumption that 1 day is sufficient time
and hence not foolproof. However the current impl of inUse would
probably only work for a single node system and would fail for shared
DataStore scenario as its an in memory state and its hard to determine
inUse state for whole cluster. For supporting such case we would have
to rely on lastModified time interval to distinguish between active
used blobs

regards
Chetan

Chetan Mehrotra