You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Fabián Mandelbaum <fm...@gmail.com> on 2011/04/29 14:03:11 UTC

DataStore files not removed when content is removed

Hello,

while doing some tests yesterday I've noticed the following
(unexpected) behaviour with JackRabbit 2.2.5 (probably 1.6.x behaves
the same):

I have a repository with lots of data (mostly files and folders), the
on-disk repository folder takes 10GB.

I remove most of the data with standard JCR APIs, the on-disk
repository folder still takes 10GB.

I use the backup feature of the jackrabbit-standalone-2.2.5.jar
(slightly modified by me to include the H2 jar files, because I use H2
as storage backend) to copy the 10GB repo folder on a new folder, the
on-disk copied repository folder with the data left after removal on
the original repo takes (the expected) 1GB.

Why isn't the datastore removing the files belonging to the removed
data (I use the standard repository.xml configuration file, just
replacing Derby by H2)?

Hope to have been clear enough, waiting for an answer, thanks in advance.

-- 
Fabián Mandelbaum
IS Engineer

Re: DataStore files not removed when content is removed

Posted by Fabián Mandelbaum <fm...@gmail.com>.
Hello Alex,

yes, I did read it, but it seems I've missed a clear mention to the
possibility of run it concurrently (and the code example given, lead
me to think that the repo should be stopped, because with an in-JVM -
model 1 - repo, not more than one app can access  the repo at the same
time) with a live repository.

Thanks for clearing that out for me, and sorry if I gave you the
impression of being lazy (you can rest assured I'm not).

Have a nice WE!

On Fri, Apr 29, 2011 at 10:27 AM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> On 29.04.11 14:53, "Fabián Mandelbaum" <fm...@gmail.com> wrote:
>>When you say 'on a scheduled basis' does this mean that you have to
>>stop the repo, run the gc little example (probably adapted to one's
>>repo config) and then start the repo again? Or can GC be run as a
>>thread (that wakes up from time to time, thus 'on a scheduled basis'
>>too) while the repo is still running?
>
> Have you read
> http://wiki.apache.org/jackrabbit/DataStore#Data_Store_Garbage_Collection
> (first hit in Justin's query)?
>
> Yes, you can run it while the repo is running and you can do it in a
> separate thread.
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel
>



-- 
Fabián Mandelbaum
IS Engineer

Re: DataStore files not removed when content is removed

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 29.04.11 14:53, "Fabián Mandelbaum" <fm...@gmail.com> wrote:
>When you say 'on a scheduled basis' does this mean that you have to
>stop the repo, run the gc little example (probably adapted to one's
>repo config) and then start the repo again? Or can GC be run as a
>thread (that wakes up from time to time, thus 'on a scheduled basis'
>too) while the repo is still running?

Have you read 
http://wiki.apache.org/jackrabbit/DataStore#Data_Store_Garbage_Collection
(first hit in Justin's query)?

Yes, you can run it while the repo is running and you can do it in a
separate thread.

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel





Re: DataStore files not removed when content is removed

Posted by Fabián Mandelbaum <fm...@gmail.com>.
Justin, thank you very much for pointing me in the right direction.
One more thing:

When you say 'on a scheduled basis' does this mean that you have to
stop the repo, run the gc little example (probably adapted to one's
repo config) and then start the repo again? Or can GC be run as a
thread (that wakes up from time to time, thus 'on a scheduled basis'
too) while the repo is still running?

Thanks again.

On Fri, Apr 29, 2011 at 9:19 AM, Justin Edelson
<ju...@justinedelson.com> wrote:
> http://www.google.com/search?q=datastore+garbage+collection
>
> Typically, it is done on a scheduled basis.
>
> Justin
>
> On Apr 29, 2011, at 8:16 AM, Fabián Mandelbaum <fm...@gmail.com> wrote:
>
>> How do I do such a thing? If I have to do it every time an item (file)
>> is deleted, wouldn't it be too resource-costly?
>>
>> On Fri, Apr 29, 2011 at 9:10 AM, Justin Edelson
>> <ju...@justinedelson.com> wrote:
>>> Did you do a datastore garbage collection?
>>>
>>> Justin
>>>
>>> On Apr 29, 2011, at 8:03 AM, Fabián Mandelbaum <fm...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> while doing some tests yesterday I've noticed the following
>>>> (unexpected) behaviour with JackRabbit 2.2.5 (probably 1.6.x behaves
>>>> the same):
>>>>
>>>> I have a repository with lots of data (mostly files and folders), the
>>>> on-disk repository folder takes 10GB.
>>>>
>>>> I remove most of the data with standard JCR APIs, the on-disk
>>>> repository folder still takes 10GB.
>>>>
>>>> I use the backup feature of the jackrabbit-standalone-2.2.5.jar
>>>> (slightly modified by me to include the H2 jar files, because I use H2
>>>> as storage backend) to copy the 10GB repo folder on a new folder, the
>>>> on-disk copied repository folder with the data left after removal on
>>>> the original repo takes (the expected) 1GB.
>>>>
>>>> Why isn't the datastore removing the files belonging to the removed
>>>> data (I use the standard repository.xml configuration file, just
>>>> replacing Derby by H2)?
>>>>
>>>> Hope to have been clear enough, waiting for an answer, thanks in advance.
>>>>
>>>> --
>>>> Fabián Mandelbaum
>>>> IS Engineer
>>>
>>
>>
>>
>> --
>> Fabián Mandelbaum
>> IS Engineer
>



-- 
Fabián Mandelbaum
IS Engineer

Re: DataStore files not removed when content is removed

Posted by Justin Edelson <ju...@justinedelson.com>.
http://www.google.com/search?q=datastore+garbage+collection

Typically, it is done on a scheduled basis.

Justin

On Apr 29, 2011, at 8:16 AM, Fabián Mandelbaum <fm...@gmail.com> wrote:

> How do I do such a thing? If I have to do it every time an item (file)
> is deleted, wouldn't it be too resource-costly?
> 
> On Fri, Apr 29, 2011 at 9:10 AM, Justin Edelson
> <ju...@justinedelson.com> wrote:
>> Did you do a datastore garbage collection?
>> 
>> Justin
>> 
>> On Apr 29, 2011, at 8:03 AM, Fabián Mandelbaum <fm...@gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> while doing some tests yesterday I've noticed the following
>>> (unexpected) behaviour with JackRabbit 2.2.5 (probably 1.6.x behaves
>>> the same):
>>> 
>>> I have a repository with lots of data (mostly files and folders), the
>>> on-disk repository folder takes 10GB.
>>> 
>>> I remove most of the data with standard JCR APIs, the on-disk
>>> repository folder still takes 10GB.
>>> 
>>> I use the backup feature of the jackrabbit-standalone-2.2.5.jar
>>> (slightly modified by me to include the H2 jar files, because I use H2
>>> as storage backend) to copy the 10GB repo folder on a new folder, the
>>> on-disk copied repository folder with the data left after removal on
>>> the original repo takes (the expected) 1GB.
>>> 
>>> Why isn't the datastore removing the files belonging to the removed
>>> data (I use the standard repository.xml configuration file, just
>>> replacing Derby by H2)?
>>> 
>>> Hope to have been clear enough, waiting for an answer, thanks in advance.
>>> 
>>> --
>>> Fabián Mandelbaum
>>> IS Engineer
>> 
> 
> 
> 
> -- 
> Fabián Mandelbaum
> IS Engineer

Re: DataStore files not removed when content is removed

Posted by Fabián Mandelbaum <fm...@gmail.com>.
How do I do such a thing? If I have to do it every time an item (file)
is deleted, wouldn't it be too resource-costly?

On Fri, Apr 29, 2011 at 9:10 AM, Justin Edelson
<ju...@justinedelson.com> wrote:
> Did you do a datastore garbage collection?
>
> Justin
>
> On Apr 29, 2011, at 8:03 AM, Fabián Mandelbaum <fm...@gmail.com> wrote:
>
>> Hello,
>>
>> while doing some tests yesterday I've noticed the following
>> (unexpected) behaviour with JackRabbit 2.2.5 (probably 1.6.x behaves
>> the same):
>>
>> I have a repository with lots of data (mostly files and folders), the
>> on-disk repository folder takes 10GB.
>>
>> I remove most of the data with standard JCR APIs, the on-disk
>> repository folder still takes 10GB.
>>
>> I use the backup feature of the jackrabbit-standalone-2.2.5.jar
>> (slightly modified by me to include the H2 jar files, because I use H2
>> as storage backend) to copy the 10GB repo folder on a new folder, the
>> on-disk copied repository folder with the data left after removal on
>> the original repo takes (the expected) 1GB.
>>
>> Why isn't the datastore removing the files belonging to the removed
>> data (I use the standard repository.xml configuration file, just
>> replacing Derby by H2)?
>>
>> Hope to have been clear enough, waiting for an answer, thanks in advance.
>>
>> --
>> Fabián Mandelbaum
>> IS Engineer
>



-- 
Fabián Mandelbaum
IS Engineer

Re: DataStore files not removed when content is removed

Posted by Justin Edelson <ju...@justinedelson.com>.
Did you do a datastore garbage collection?

Justin

On Apr 29, 2011, at 8:03 AM, Fabián Mandelbaum <fm...@gmail.com> wrote:

> Hello,
> 
> while doing some tests yesterday I've noticed the following
> (unexpected) behaviour with JackRabbit 2.2.5 (probably 1.6.x behaves
> the same):
> 
> I have a repository with lots of data (mostly files and folders), the
> on-disk repository folder takes 10GB.
> 
> I remove most of the data with standard JCR APIs, the on-disk
> repository folder still takes 10GB.
> 
> I use the backup feature of the jackrabbit-standalone-2.2.5.jar
> (slightly modified by me to include the H2 jar files, because I use H2
> as storage backend) to copy the 10GB repo folder on a new folder, the
> on-disk copied repository folder with the data left after removal on
> the original repo takes (the expected) 1GB.
> 
> Why isn't the datastore removing the files belonging to the removed
> data (I use the standard repository.xml configuration file, just
> replacing Derby by H2)?
> 
> Hope to have been clear enough, waiting for an answer, thanks in advance.
> 
> -- 
> Fabián Mandelbaum
> IS Engineer