You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by John Langley <di...@gmail.com> on 2012/04/02 18:24:11 UTC

example of running the Garbage Collector?

Does anyone have a simple example of running the GarbageCollector in
the same process as their repository?
It seems like this is a best practice, but I don't see any of examples of it.
If it matters we're running 2.5.5.


Thanks in advance!
-- Langley

AW: example of running the Garbage Collector?

Posted by "Cech. Ulrich" <Ul...@aeb.de>.
Hi John,

I made the following class for running garbage collection in the datastore. I don't need a transient repository with extra login for this. I only use the actual RepositoryManagers(s).

Bye,
Ulrich
 

*************************************************************
import java.util.List;

import javax.jcr.RepositoryException;

import org.apache.jackrabbit.api.management.DataStoreGarbageCollector;
import org.apache.jackrabbit.api.management.RepositoryManager;

import edu.umd.cs.findbugs.annotations.SuppressWarnings;

/**
 * The data store never deletes entries except when running data store garbage
 * collection. Similar to Java heap garbage collection, data store garbage
 * collection will first mark all used entries, and later remove unused items.
 * <p>
 * Data store garbage collection does not delete entries if the identifier is
 * still in the Java heap memory. To delete as many unreferenced entries as
 * possible, call System.gc() a few times before running the data store garbage
 * collection. Please note System.gc() does not guarantee all objects are
 * garbage collected.
 * 
 * @author cech
 * 
 */
public class DataStoreGC extends Thread {
    /**
     * The logger for this class.
     */
    private static org.apache.log4j.Logger logger =
            org.apache.log4j.Logger.getLogger(DataStoreGC.class);
    
    private List<RepositoryManager> repositoryManagers;

    /**
     * Constructs a new <code>DataStoreGC</code> with a list of
     * <code>RepositoryManager</code> for which the garbage collection
     * should run.
     */
    public DataStoreGC(List<RepositoryManager> rms) {
        super();
        if (rms == null || rms.size() < 1) {
            throw new IllegalArgumentException(
                    "The list of repository managers is empty.");
        }
        setRepositoryManagers(rms);
    }

    /**
     * {@inheritDoc}
     */
    @Override
	public void run() {
        while (Thread.currentThread().isInterrupted()) {
            try {
                System.out.println("Running DataStoreGC...");
                runDataStoreGarbageCollector();
                System.out.println("DataStoreGC done.");
                Thread.sleep(10000);
            } catch (InterruptedException irex) {
                // ignore here
            } catch (RepositoryException repoex) {
                logger.error("Error while communicating with the repository.",
                        repoex);
                try {
                    Thread.sleep(10000);
                } catch (InterruptedException e) {}
            }
        }
    }

    /**
     * Runs the garbage collector for the given RepositoryManagers. If multiple
     * repositories use the same data store, give all RepositoryManagers in the
     * parameter list.
     * 
     * @param rms
     * @throws RepositoryException
     */
    @SuppressWarnings(value="DM_GC")
    private int runDataStoreGarbageCollector()
            throws RepositoryException {
        int result = 0;
        List<RepositoryManager> rms = getRepositoryManagers();
        if (rms == null || rms.size() < 1) {
            throw new IllegalArgumentException(
                    "The list of repository managers is empty.");
        }
        DataStoreGarbageCollector[] gcs =
            new DataStoreGarbageCollector[rms.size()];
        for (int i = 0; i < 5; i++) {
            System.gc(); //ignore FindBugs warning here
        }
        for (int i = 0; i < rms.size(); ++i) {
            gcs[i] = rms.get(i).createDataStoreGarbageCollector();
        }
        try {
            /* Mark all records in all repositories */
            for (DataStoreGarbageCollector gc : gcs) {
                gc.mark();
            }
            /* Important to call sweep() on the first GarbageCollector */
            result = gcs[0].sweep();
        } finally {
            for (DataStoreGarbageCollector gc : gcs) {
                gc.close();
            }
        }
        return result;
    }

    private List<RepositoryManager> getRepositoryManagers() {
        return repositoryManagers;
    }

    private void setRepositoryManagers(List<RepositoryManager> repositoryManagers) {
        this.repositoryManagers = repositoryManagers;
    }

}
*************************************************************



Re: example of running the Garbage Collector?

Posted by John Langley <di...@gmail.com>.
Thanks Alex, however I was hoping for something different.

The approach used in the test code is to create a separate transient
repository.
Which something we've been doing for a while, when we create this
repository we make it join as part of a cluster, which of course means
it needs to index all the files, effects the journal table etc.

What I was hoping for is something we could just run in the same
environment as our primary jackrabbit instance, i.e. with the same
repository that the RepositoryAccessServlet uses. In fact, I'd love to
get the repository in that same way.

Here's some code I tried, but it never did the garbage collection,
even though it "seems" to work, i.e. runs w/out failure.

	protected String servletRepositoryGC() {
		String retVal = "fail.";
                 Session session = null;
                 DataStoreGarbageCollector garbageCollector = null;		
                 Repository repository =
RepositoryAccessServlet.getRepository(getServletContext());
                 try {
                     // Credentials to create a valid session for the
user to access the
                     // repository's DataStoreGarbageCollector.
                 	String userName = "admin";
                 	String password = "somepassword";
                     session = repository.login(new
SimpleCredentials(userName, password.toCharArray()));
                     garbageCollector =
((SessionImpl)session).createDataStoreGarbageCollector();

                     logger.info(">>>> HACK >>>> Mark Phase Is Starting.");
                     garbageCollector.mark();

                     logger.info(">>>> HACK >>>> Mark Phase Is
Complete. Sweep Phase Is Starting.");
                     garbageCollector.sweep();

                     logger.info(">>>> HACK >>>> Sweep Phase Is Complete.");
                     retVal = "ok.";
                 } catch (Exception e) {
                     logger.severe(">>>> HACK >>>> Exception while
Garbage Collection" + e.toString());
                 } finally {
                     garbageCollector.close();
                     session.logout();
                 }
                 return retVal;
         	}

The measure of true garbage collection for us is select count(*) on
the DATASTORE table in the RDBMS (mysql in our case). We can see the
count go down when GC runs correctly, which it does if we start the
transient repository as described earlier.

So... how ~do~ people with production Jackrabbit servers do Garbage
Collection? It seems like there are only 3 options:
1) shutdown your repository and use the technique that the unit test
demonstrates
2) leave your repository running, but add a transient repository as a
cluster member (means your primary JR instance must run in cluster
mode too). This means you can do this as a separate process and run
with a cronjob or on demand.
3) Find a way to run it in process and a timer thread with the same
repository definition as the main JR instance. This is the option I'm
looking for.

Thanks everyone! I think everyone who does a fair number of writes
will benefit from this. For our application we do LOTS of writes, so
GC is essential.

-- Langley
--------------------------------------------------------------------------------------------------------------
From: Alex Parvulescu [alex.parvulescu@gmail.com]
Sent: Monday, April 02, 2012 3:42 PM
To: users@jackrabbit.apache.org
Subject: Re: example of running the Garbage Collector?

Hi John,

This could get you started:
http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-core/src/test/java/org/apache/jackrabbit/core/data/DataStoreAPITest.java


best,
alex
------------------------------------------------------------------------------------------------------------
On Mon, Apr 2, 2012 at 12:24 PM, John Langley <di...@gmail.com> wrote:
> Does anyone have a simple example of running the GarbageCollector in
> the same process as their repository?
> It seems like this is a best practice, but I don't see any of examples of it.
> If it matters we're running 2.5.5.
>
>
> Thanks in advance!
> -- Langley

Re: example of running the Garbage Collector?

Posted by Alex Parvulescu <al...@gmail.com>.
Hi John,

This could get you started:
http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-core/src/test/java/org/apache/jackrabbit/core/data/DataStoreAPITest.java


best,
alex


On Mon, Apr 2, 2012 at 6:24 PM, John Langley <di...@gmail.com> wrote:

> Does anyone have a simple example of running the GarbageCollector in
> the same process as their repository?
> It seems like this is a best practice, but I don't see any of examples of
> it.
> If it matters we're running 2.5.5.
>
>
> Thanks in advance!
> -- Langley
>