You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Pankaj_Gupta <pa...@ansys.com> on 2008/03/18 00:49:52 UTC

Datastore garbage collector removes in-use files

Hi,

I am using the garbage collection strategy as explained in the DataStore
wiki page but it is not working as expected. Apart from removing unused
files it is also removing files that are still in use. I am using Jackrabbit
1.4.1 with Mysql database and BundleDbPersistenceManager.

I found that the gc.scan() method is not updating the modification time of
all files in the datastore. The behavior is somewhat erratic. The
modification time of a few files is updated when gc starts but other files
are left untouched. When I run gc.scan() and gc.stopScan() repeatedly, then
the modification time of the same set of files is updated again and the rest
are always left untouched. To keep track of the file modification time I am
not calling gc.deleteUnused(). 

To further track this down I downloaded the latest source of
GarbageCollector and created a new instance of it directly. This is shown in
the following listing:
		Credentials creds = 
			new SimpleCredentials("user", "password".toCharArray());
		Session session = rep.login(creds, "root");

		String[] workspaceNames =
session.getWorkspace().getAccessibleWorkspaceNames();
		Session[] sessionList = new Session[workspaceNames.length];
		int count = 0;
		for (String ws : workspaceNames) {
			Session s = rep.login(creds, ws);
			sessionList[count++] = s;
		}
		GarbageCollector gc = new GarbageCollector((SessionImpl) session, null,
sessionList);

Now if I call gc.scan() and gc.stopScan() the modification time of all files
is updated as expected.

I thought that maybe the issue is with traversing of nodes with
IterablePersistenceManagers. So I changed  to SimpleDbPersistenceManager.
This would have forced gc to use sessionList for iterating over all nodes.
But when I do this, modification time of no file is updated. This is true
whether I use my own GarbageCollector or the built-in one by calling
session.createDataStoreGarbageCollector(). Through print statements in my
GarbageCollector I can see that all the nodes are getting scanned, but
somehow the file modification time isn't updated.

Any help will be greatly appreciated since without proper cleanup we can't
use the datastore functionality

-- 
View this message in context: http://www.nabble.com/Datastore-garbage-collector-removes-in-use-files-tp16108822p16108822.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Datastore garbage collector removes in-use files

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

You are right, there is a problem using the data store with
SimpleDbPersistenceManager. This is a limitation of the current
implementation. I have now updated the wiki at
http://wiki.apache.org/jackrabbit/DataStore

Regards,
Thomas

On Tue, Mar 18, 2008 at 3:04 AM, Pankaj_Gupta <pa...@ansys.com> wrote:
>
>  After a little more digging I found why the SimpleDbPersistenceManager wasn't
>  working for either the custom or built-in garbage collector.
>
>  It seems that when SimpleDbPersistenceManager is used then Jackrabbit stores
>  the file content in both the datastore and the blobs folder, even if
>  datastore has been specified in repository.xml and
>  org.jackrabbit.useDataStore is set to true. And when reading the content it
>  just reads it from the blobs folder. So the content written in DataStore is
>  redundant.
>
>  By keeping everything else the same and only changing the persistence
>  manager to BundleDbPersistenceManager the file storage works as expected.
>  The blobs folder doesn't get created and the file content is written only in
>  datastore.
>
>  This is getting really weird and is making me wonder whether I am doing
>  something fundamentally wrong here.
>  --
>  View this message in context: http://www.nabble.com/Datastore-garbage-collector-removes-in-use-files-tp16108822p16112214.html
>
>
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>
>

Re: Datastore garbage collector removes in-use files

Posted by Pankaj_Gupta <pa...@ansys.com>.
After a little more digging I found why the SimpleDbPersistenceManager wasn't
working for either the custom or built-in garbage collector.

It seems that when SimpleDbPersistenceManager is used then Jackrabbit stores
the file content in both the datastore and the blobs folder, even if
datastore has been specified in repository.xml and
org.jackrabbit.useDataStore is set to true. And when reading the content it
just reads it from the blobs folder. So the content written in DataStore is
redundant.

By keeping everything else the same and only changing the persistence
manager to BundleDbPersistenceManager the file storage works as expected.
The blobs folder doesn't get created and the file content is written only in
datastore.

This is getting really weird and is making me wonder whether I am doing
something fundamentally wrong here.
-- 
View this message in context: http://www.nabble.com/Datastore-garbage-collector-removes-in-use-files-tp16108822p16112214.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Datastore garbage collector removes in-use files

Posted by Pankaj_Gupta <pa...@ansys.com>.
Yes, using 1.4.2 fixes the problem. I knew about the other issue but didn't
think it would apply here since I am using just one thread of execution and
there are no synchronization issues involved. But apparently that fix
resolves this problem as well.

I am eagerly waiting the release of 1.4.2 since our product which is based
on Jackrabbit is scheduled to be released in 2 weeks.

Thanks,
Pankaj
-- 
View this message in context: http://www.nabble.com/Datastore-garbage-collector-removes-in-use-files-tp16108822p16120435.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Datastore garbage collector removes in-use files

Posted by Thomas Mueller <th...@gmail.com>.
Hi,

> I am using the garbage collection strategy as explained in the DataStore
> wiki page but it is not working as expected.

I did fix a bug in this area, I think it is related:
https://issues.apache.org/jira/browse/JCR-1414
This bug will be fixed in release 1.4.2. Please not that this version
is not yet released, but you can download it at:

http://people.apache.org/~jukka/jackrabbit/jackrabbit-core-1.4.2/

Could you test your application again with this version and post the result?

Thanks a lot for your help!

Regards,
Thomas