You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Alexander Wallace <aw...@rwmotloc.com> on 2009/01/13 20:59:46 UTC

Cluster Datastore garbage collection.

Hi all!

i think the answer to my following question is "yes" after reading this 
wiki page:

http://wiki.apache.org/jackrabbit/DataStore

But I want to verify:

If i have cluster configuration but only one repository using the same 
data store, I can use the normal garbage collection (ie: 
gc.deleteUnused(); ), correct?

The [Manually delete files with last modified date older than X] (from 
that page) only applys to haging multiple repositories share the same 
datastore...

Thanks in advance for this info!


Re: Cluster Datastore garbage collection.

Posted by Alexander Wallace <aw...@rwmotloc.com>.
Alexander Klimetschek wrote:
> On Tue, Jan 13, 2009 at 8:59 PM, Alexander Wallace <aw...@rwmotloc.com> wrote:
>   
>
> I would think so, but I am not 100% sure what happens if one node
> already has removed the reference to the datastore entry (aka marked
> unused during gc), but another cluster node still wants to read from
> it, because the removed node or property event hasn't arrived yet...
>
>   
Got it... Since the cluster syncronization happens very often, and 
sounds like running GC is not a super high priority, It seems that it 
would be safer to run the GC say, once a day during a very low traffic 
window, perhaps at 3 am or so, to minimize user exposure...
> Regards,
> Alex
>
>   
Thanks a lot for the response!

Alex.

Re: Cluster Datastore garbage collection.

Posted by Alexander Klimetschek <ak...@day.com>.
On Tue, Jan 13, 2009 at 8:59 PM, Alexander Wallace <aw...@rwmotloc.com> wrote:
> i think the answer to my following question is "yes" after reading this wiki
> page:
>
> http://wiki.apache.org/jackrabbit/DataStore
>
> But I want to verify:
>
> If i have cluster configuration but only one repository using the same data
> store, I can use the normal garbage collection (ie: gc.deleteUnused(); ),
> correct?

I would think so, but I am not 100% sure what happens if one node
already has removed the reference to the datastore entry (aka marked
unused during gc), but another cluster node still wants to read from
it, because the removed node or property event hasn't arrived yet...

>
> The [Manually delete files with last modified date older than X] (from that
> page) only applys to haging multiple repositories share the same
> datastore...

Right. There is no synchronization mechanism in the datastore for
access by multiple repositories, which would be required for the
garbage collection to find out which entries are unused by _all_
repositories before deleting them. Collision is avoided purely by the
use of UUIDs/hashs, so when writing to the datastore no
synchronization is required. See also
https://issues.apache.org/jira/browse/JCR-1865


Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com