You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Euan Green <eu...@gmail.com> on 2009/11/12 10:21:27 UTC

Garbage Collection not deleting until restart of server

Hi,

I am running garbage collection but it seems the files in the repository are
only removed after I restart my application server (Tomcat). I think there
may be a problem with removing items from the repository due to locking or
threading but I'm not sure. Can anyone shed any light on this?

I'm using this code for Garbage Collection:

SessionImpl si = (SessionImpl)session;
GarbageCollector gc = si.createDataStoreGarbageCollector();
gc.scan();
gc.stopScan();
gc.deleteUnused();
gc.close();

Regards,

Euan
-- 
View this message in context: http://n4.nabble.com/Garbage-Collection-not-deleting-until-restart-of-server-tp620143p620143.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Garbage Collection not deleting until restart of server

Posted by Guo Du <mr...@gmail.com>.
On Thu, Nov 12, 2009 at 1:10 PM, Thomas Müller <th...@day.com> wrote:
> I think it's better to document the problem than trying to solve it,
> specially because it's almost impossible to solve it. I have
> documented it now at
> http://wiki.apache.org/jackrabbit/DataStore#Data_Store_Garbage_Collection

It's good enough to know the limitation and thank you very much to
document this.

>> Normal application code shouldn't aware of System.gc.
> I believe that normal applications don't need to reclaim the disk
> space _immediately_. Normal applications don't need to call
> System.gc().
I believe GarbageCollector will only be invoked by management tasks
when there is a requirement to release the space. It won't be
_immediately_ for normal user or process.

As it's already been documented, there is no challenge to use System.gc().

Thanks!

--Guo

Re: Garbage Collection not deleting until restart of server

Posted by Thomas Müller <th...@day.com>.
Hi,

> I would suggest move System.gc operation into GarbageCollector.

I think it's better to document the problem than trying to solve it,
specially because it's almost impossible to solve it. I have
documented it now at
http://wiki.apache.org/jackrabbit/DataStore#Data_Store_Garbage_Collection

> As when we call the deleteUnused(), we expect it should do the job.

Unfortunately, with the current Jackrabbit implementation, it's not
possible to guarantee that. The weak references map is required,
because the node with the binary might not be persisted yet.

> Normal application code shouldn't aware of System.gc.

I believe that normal applications don't need to reclaim the disk
space _immediately_. Normal applications don't need to call
System.gc().

Regards,
Thomas

Re: Garbage Collection not deleting until restart of server

Posted by Guo Du <mr...@gmail.com>.
On Thu, Nov 12, 2009 at 10:50 AM, Euan Green <eu...@gmail.com> wrote:
>>If you want to ensure all files are deleted, an option is to call
>>System.gc() a few times before running the data store garbage
>>collection.
>
> That's exactly what it was, just run some test and the files are being
> deleted now if System.gc is run before the GarbageCollection for the
> repository.
I would suggest move System.gc operation into GarbageCollector.

As when we call the deleteUnused(), we expect it should do the job.
Normal application code shouldn't aware of System.gc.

--Guo

Re: Garbage Collection not deleting until restart of server

Posted by Euan Green <eu...@gmail.com>.
>The data store keeps a list of 'recently used items' in a weak hash
>map (FileDataStore.inUse). "All data identifiers that are currently in
>use are in this set until they are garbage collected." Garbage
>collection here means Java garbage collection. That means as long as
>the identifiers are in the Java heap, the files are not deleted.

>The test cases call FileDataStore.clearInUse(), however this shouldn't
>be used for production, just for testing.

>If you want to ensure all files are deleted, an option is to call
>System.gc() a few times before running the data store garbage
>collection. 

Many thanks! 

That's exactly what it was, just run some test and the files are being
deleted now if System.gc is run before the GarbageCollection for the
repository.

Thanks for the time and effort helping!

Euan

-- 
View this message in context: http://n4.nabble.com/Garbage-Collection-not-deleting-until-restart-of-server-tp620143p620201.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Garbage Collection not deleting until restart of server

Posted by Thomas Müller <th...@day.com>.
Hi,

I think what the problem is.

The data store keeps a list of 'recently used items' in a weak hash
map (FileDataStore.inUse). "All data identifiers that are currently in
use are in this set until they are garbage collected." Garbage
collection here means Java garbage collection. That means as long as
the identifiers are in the Java heap, the files are not deleted.

The test cases call FileDataStore.clearInUse(), however this shouldn't
be used for production, just for testing.

If you want to ensure all files are deleted, an option is to call
System.gc() a few times before running the data store garbage
collection.

Regards,
Thomas



On Thu, Nov 12, 2009 at 11:26 AM, Euan Green <eu...@gmail.com> wrote:
>
>>Thanks! You wrote that some files are still there after the data store
>>garbage collection ran, but they are removed after restarting Tomcat.
>>Did I understand that correctly? That's strange because the data store
>>only deletes files during the deleteUnused() call, never at startup.
>>Unless, your application calls deleteUnused() at startup. Could you
>>provide file listing (dir /s I believe) of the datastore directory
>>before calling garbage collection, after calling garbage collection,
>>and after restarting Tomcat?
>
> Hi, I wasn't probably clear enough. I run a Garbage Collection process  from
> a batch file every minute (the method is contained within the same
> application that will adds to the repository), so after restarting the
> application the GarbageCollection will run and the files will be deleted. If
> I add a new item to the datastore AFTER I have restarted the the application
> this new one is NOT removed when Garbage Colelction first runs, but the old
> one is.
>
> See below for file listings you asked for.
>
> Regards,
>
> Euan
>
> Before garbage collection:
> C:\apache-tomcat-5.5.26\repository\repository\datastore>dir /s
>  Volume in drive C has no label.
>  Volume Serial Number is 0408-167E
>
>  Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:13    <DIR>          db
>               0 File(s)              0 bytes
>
>  Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:13    <DIR>          a3
>               0 File(s)              0 bytes
>
>  Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:13    <DIR>          b4
>               0 File(s)              0 bytes
>
>  Directory of
> C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3\b4
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:13         1,855,295 dba3b49091e3fc230c352efe019aecf1f421aa8b
>               1 File(s)      1,855,295 bytes
>
>     Total Files Listed:
>               1 File(s)      1,855,295 bytes
>              11 Dir(s)  69,862,002,688 bytes free
>
>
> After Garbage Collection:
> C:\apache-tomcat-5.5.26\repository\repository\datastore>dir /s
>  Volume in drive C has no label.
>  Volume Serial Number is 0408-167E
>
>  Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:13    <DIR>          db
>               0 File(s)              0 bytes
>
>  Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:13    <DIR>          a3
>               0 File(s)              0 bytes
>
>  Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:13    <DIR>          b4
>               0 File(s)              0 bytes
>
>  Directory of
> C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3\b4
>
> 12/11/2009  10:13    <DIR>          .
> 12/11/2009  10:13    <DIR>          ..
> 12/11/2009  10:16         1,855,295 dba3b49091e3fc230c352efe019aecf1f421aa8b
>               1 File(s)      1,855,295 bytes
>
>     Total Files Listed:
>               1 File(s)      1,855,295 bytes
>              11 Dir(s)  69,861,986,304 bytes free
>
> After Garbage Collection has first run after restart of server:
> C:\apache-tomcat-5.5.26\repository\repository\datastore>dir /s
>  Volume in drive C has no label.
>  Volume Serial Number is 0408-167E
>
>  Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore
>
> 12/11/2009  10:17    <DIR>          .
> 12/11/2009  10:17    <DIR>          ..
>               0 File(s)              0 bytes
>
>     Total Files Listed:
>               0 File(s)              0 bytes
>               2 Dir(s)  69,863,792,640 bytes free
>
>
> --
> View this message in context: http://n4.nabble.com/Garbage-Collection-not-deleting-until-restart-of-server-tp620143p620190.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>

Re: Garbage Collection not deleting until restart of server

Posted by Euan Green <eu...@gmail.com>.
>Thanks! You wrote that some files are still there after the data store
>garbage collection ran, but they are removed after restarting Tomcat.
>Did I understand that correctly? That's strange because the data store
>only deletes files during the deleteUnused() call, never at startup.
>Unless, your application calls deleteUnused() at startup. Could you
>provide file listing (dir /s I believe) of the datastore directory
>before calling garbage collection, after calling garbage collection,
>and after restarting Tomcat?

Hi, I wasn't probably clear enough. I run a Garbage Collection process  from
a batch file every minute (the method is contained within the same
application that will adds to the repository), so after restarting the
application the GarbageCollection will run and the files will be deleted. If
I add a new item to the datastore AFTER I have restarted the the application
this new one is NOT removed when Garbage Colelction first runs, but the old
one is. 

See below for file listings you asked for.

Regards,

Euan

Before garbage collection:
C:\apache-tomcat-5.5.26\repository\repository\datastore>dir /s
 Volume in drive C has no label.
 Volume Serial Number is 0408-167E

 Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:13    <DIR>          db
               0 File(s)              0 bytes

 Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:13    <DIR>          a3
               0 File(s)              0 bytes

 Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:13    <DIR>          b4
               0 File(s)              0 bytes

 Directory of
C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3\b4

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:13         1,855,295 dba3b49091e3fc230c352efe019aecf1f421aa8b
               1 File(s)      1,855,295 bytes

     Total Files Listed:
               1 File(s)      1,855,295 bytes
              11 Dir(s)  69,862,002,688 bytes free


After Garbage Collection:
C:\apache-tomcat-5.5.26\repository\repository\datastore>dir /s
 Volume in drive C has no label.
 Volume Serial Number is 0408-167E

 Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:13    <DIR>          db
               0 File(s)              0 bytes

 Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:13    <DIR>          a3
               0 File(s)              0 bytes

 Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:13    <DIR>          b4
               0 File(s)              0 bytes

 Directory of
C:\apache-tomcat-5.5.26\repository\repository\datastore\db\a3\b4

12/11/2009  10:13    <DIR>          .
12/11/2009  10:13    <DIR>          ..
12/11/2009  10:16         1,855,295 dba3b49091e3fc230c352efe019aecf1f421aa8b
               1 File(s)      1,855,295 bytes

     Total Files Listed:
               1 File(s)      1,855,295 bytes
              11 Dir(s)  69,861,986,304 bytes free

After Garbage Collection has first run after restart of server:
C:\apache-tomcat-5.5.26\repository\repository\datastore>dir /s
 Volume in drive C has no label.
 Volume Serial Number is 0408-167E

 Directory of C:\apache-tomcat-5.5.26\repository\repository\datastore

12/11/2009  10:17    <DIR>          .
12/11/2009  10:17    <DIR>          ..
               0 File(s)              0 bytes

     Total Files Listed:
               0 File(s)              0 bytes
               2 Dir(s)  69,863,792,640 bytes free


-- 
View this message in context: http://n4.nabble.com/Garbage-Collection-not-deleting-until-restart-of-server-tp620143p620190.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Garbage Collection not deleting until restart of server

Posted by Thomas Müller <th...@day.com>.
Hi,

Thanks! You wrote that some files are still there after the data store
garbage collection ran, but they are removed after restarting Tomcat.
Did I understand that correctly? That's strange because the data store
only deletes files during the deleteUnused() call, never at startup.
Unless, your application calls deleteUnused() at startup. Could you
provide file listing (dir /s I believe) of the datastore directory
before calling garbage collection, after calling garbage collection,
and after restarting Tomcat?

Regards,
Thomas




On Thu, Nov 12, 2009 at 10:56 AM, Euan Green <eu...@gmail.com> wrote:
>
> Hi,
>
> I mean the files in the datastore directory. I'm viewing them using windows
> explorer and refreshing after deletes.
>
> I'm using a FileDataStore as well, see repository.xml below:
> I'm using Jackrabbit 1.6.0
>
> Regards,
>
> Euan
>
> <?xml version="1.0"?>
> <!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD
> Jackrabbit 1.6//EN"
>
> "http://jackrabbit.apache.org/dtd/repository-1.6.dtd">
> <!-- Example Repository Configuration File -->
> <Repository>
>
>    <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>
>    </FileSystem>
>
>    <!--
>        security configuration
>    -->
>    <Security appName="Jackrabbit">
>        <!--
>            access manager:
>            class: FQN of class implementing the AccessManager interface
>        -->
>        <AccessManager
> class="org.apache.jackrabbit.core.security.SimpleAccessManager">
>            <!--  -->
>        </AccessManager>
>
>        <LoginModule
> class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>           <!-- anonymous user name ('anonymous' is the default value) -->
>
>           <!--
>              default user name to be used instead of the anonymous user
>              when no login credentials are provided (unset by default)
>           -->
>           <!--  -->
>        </LoginModule>
>    </Security>
>
>    <!--
>        location of workspaces root directory and name of default workspace
>    -->
>    <Workspaces rootPath="${rep.home}/workspaces"
> defaultWorkspace="default"/>
>    <!--
>        workspace configuration template:
>        used to create the initial workspace if there's no workspace yet
>    -->
>    <Workspace name="${wsp.name}">
>        <!--
>            virtual file system of the workspace:
>            class: FQN of class implementing the FileSystem interface
>        -->
>        <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>
>        </FileSystem>
>        <!--
>            persistence manager of the workspace:
>            class: FQN of class implementing the PersistenceManager
> interface
>        -->
>        <PersistenceManager
> class="org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager">
>
>
>
>
>
>
>
>        </PersistenceManager>
>        <!--
>            Search index and the file system it uses.
>            class: FQN of class implementing the QueryHandler interface
>        -->
>        <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
>        </SearchIndex>
>    </Workspace>
>
>    <!--
>        Configures the versioning
>    -->
>    <Versioning rootPath="${rep.home}/version">
>        <!--
>            Configures the filesystem to use for versioning for the
> respective
>            persistence manager
>        -->
>        <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>
>        </FileSystem>
>
>        <!--
>            Configures the persistence manager to be used for persisting
> version state.
>            Please note that the current versioning implementation is based
> on
>            a 'normal' persistence manager, but this could change in future
>            implementations.
>        -->
>        <PersistenceManager
> class="org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager">
>
>
>
>
>
>
>
>        </PersistenceManager>
>    </Versioning>
>
>    <!--
>        Search index for content that is shared repository wide
>        (/jcr:system tree, contains mainly versions)
>    -->
>    <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
>    </SearchIndex>
>
>     <DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
>
>
>    </DataStore>
>
> </Repository>
>
>
>
>
> Thomas Müller-2 wrote:
>>
>> Hi,
>>
>>> I am running garbage collection but it seems the files in the repository
>>> are
>>> only removed after I restart my application server (Tomcat).
>>
>> Do you mean the files in the datastore directory? Or do you mean other
>> files? How did you find out they are not deleted before, but deleted
>> afterwards (using what tool)?
>>
>>> I'm using this code for Garbage Collection:
>>
>> The code looks good, but please note this is only affecting the data
>> store. Do you use a FileDataStore? If you are not sure, could you post
>> the repository.xml and the Jackrabbit version you are using?
>>
>> Regards,
>> Thomas
>>
>>
>
> --
> View this message in context: http://n4.nabble.com/Garbage-Collection-not-deleting-until-restart-of-server-tp620143p620172.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>

Re: Garbage Collection not deleting until restart of server

Posted by Euan Green <eu...@gmail.com>.
Hi,

I mean the files in the datastore directory. I'm viewing them using windows
explorer and refreshing after deletes.

I'm using a FileDataStore as well, see repository.xml below:
I'm using Jackrabbit 1.6.0

Regards,

Euan

<?xml version="1.0"?>
<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD
Jackrabbit 1.6//EN"
                           
"http://jackrabbit.apache.org/dtd/repository-1.6.dtd">
<!-- Example Repository Configuration File -->
<Repository>

    <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        
    </FileSystem>

    <!--
        security configuration
    -->
    <Security appName="Jackrabbit">
        <!--
            access manager:
            class: FQN of class implementing the AccessManager interface
        -->
        <AccessManager
class="org.apache.jackrabbit.core.security.SimpleAccessManager">
            <!--  -->
        </AccessManager>

        <LoginModule
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
           <!-- anonymous user name ('anonymous' is the default value) -->
           
           <!--
              default user name to be used instead of the anonymous user
              when no login credentials are provided (unset by default)
           -->
           <!--  -->
        </LoginModule>
    </Security>

    <!--
        location of workspaces root directory and name of default workspace
    -->
    <Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="default"/>
    <!--
        workspace configuration template:
        used to create the initial workspace if there's no workspace yet
    -->
    <Workspace name="${wsp.name}">
        <!--
            virtual file system of the workspace:
            class: FQN of class implementing the FileSystem interface
        -->
        <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            
        </FileSystem>
        <!--
            persistence manager of the workspace:
            class: FQN of class implementing the PersistenceManager
interface
        -->
        <PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager">
           
           
           
           
           
           
           
        </PersistenceManager>
        <!--
            Search index and the file system it uses.
            class: FQN of class implementing the QueryHandler interface
        -->
        <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            
        </SearchIndex>
    </Workspace>

    <!--
        Configures the versioning
    -->
    <Versioning rootPath="${rep.home}/version">
        <!--
            Configures the filesystem to use for versioning for the
respective
            persistence manager
        -->
        <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            
        </FileSystem>

        <!--
            Configures the persistence manager to be used for persisting
version state.
            Please note that the current versioning implementation is based
on
            a 'normal' persistence manager, but this could change in future
            implementations.
        -->
        <PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager">
           
           
           
           
           
           
           
        </PersistenceManager>
    </Versioning>

    <!--
        Search index for content that is shared repository wide
        (/jcr:system tree, contains mainly versions)
    -->
    <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
        
    </SearchIndex>
    
     <DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
        
        
    </DataStore>
    
</Repository>




Thomas Müller-2 wrote:
> 
> Hi,
> 
>> I am running garbage collection but it seems the files in the repository
>> are
>> only removed after I restart my application server (Tomcat).
> 
> Do you mean the files in the datastore directory? Or do you mean other
> files? How did you find out they are not deleted before, but deleted
> afterwards (using what tool)?
> 
>> I'm using this code for Garbage Collection:
> 
> The code looks good, but please note this is only affecting the data
> store. Do you use a FileDataStore? If you are not sure, could you post
> the repository.xml and the Jackrabbit version you are using?
> 
> Regards,
> Thomas
> 
> 

-- 
View this message in context: http://n4.nabble.com/Garbage-Collection-not-deleting-until-restart-of-server-tp620143p620172.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Garbage Collection not deleting until restart of server

Posted by Thomas Müller <th...@day.com>.
Hi,

> I am running garbage collection but it seems the files in the repository are
> only removed after I restart my application server (Tomcat).

Do you mean the files in the datastore directory? Or do you mean other
files? How did you find out they are not deleted before, but deleted
afterwards (using what tool)?

> I'm using this code for Garbage Collection:

The code looks good, but please note this is only affecting the data
store. Do you use a FileDataStore? If you are not sure, could you post
the repository.xml and the Jackrabbit version you are using?

Regards,
Thomas