You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Eugeny N Dzhurinsky <bo...@redwerk.com> on 2006/05/05 12:30:28 UTC

session.save takes about 20 minutes

Hi there!
We are facing some strange issues: there are about 4000 nodes we are trying to
save into repository with org.apache.jackrabbit.core.fs.local.LocalFileSystem
and org.apache.jackrabbit.core.state.xml.XMLPersistenceManager

for some reason it takes about 20 minutes to save session (we are doing
session.save after we imported each ~400 nodes).

Any ides how to speed up this?
-- 
Eugene N Dzhurinsky

Re: session.save takes about 20 minutes

Posted by Marcel Reutegger <ma...@gmx.net>.
Jackrabbit does not run with lucene 1.9 because of a backward compatibility issue 
with the 1.9 release.

See: http://issues.apache.org/jira/browse/JCR-352

regards
  marcel

Eugeny N Dzhurinsky wrote:
> On Fri, May 05, 2006 at 01:43:41PM +0200, Stefan Guggisberg wrote:
>> don't use XMLPersistenceManager; you should use jr's default configuration
>> (i.e. DerbyPersistenceManager) instead.
> 
> okay, i configured repository like this:
> 
> 
> <?xml version="1.0"?>
> <Repository>
>     <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>         <param name="path" value="${rep.home}/repository" />
>     </FileSystem>
> 
>     <Security appName="Jackrabbit">
>         <AccessManager class="cms.security.LuceneAccessManagerImpl">
>             <param name="file" value="suxx" />
> 
>             <param name="test" value="suxx2" />
>         </AccessManager>
> 
>         <LoginModule class="cms.auth.EasyLoginModule" />
>     </Security>
> 
>     <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" />
> 
>     <Workspace name="${wsp.name}">
>         <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>             <param name="path" value="${wsp.home}" />
>         </FileSystem>
> 
>         <PersistenceManager
>         class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
>             <param name="url" value="jdbc:derby:${wsp.home}/db;create=true" />
> 
>             <param name="schemaObjectPrefix" value="${wsp.name}_" />
>         </PersistenceManager>
>         <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>             <param name="path" value="${wsp.home}/index" />
>         </SearchIndex>
>     </Workspace>
> 
>     <Versioning rootPath="${rep.home}/versions">
>         <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>             <param name="path" value="${rep.home}/versions" />
>         </FileSystem>
>         <PersistenceManager
>         class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
>             <param name="url" value="jdbc:derby:${rep.home}/version/db;create=true" />
>             <param name="schemaObjectPrefix" value="version_" />
>         </PersistenceManager>
>     </Versioning>
> </Repository>
> 
> 
> And sometimes it throws an exception:
>  [java] ERROR 08/11/06 05:11:55 [main] (ObservationManagerFactory:220) - Synchronous EventConsumer threw exception.
>      [java] java.lang.ClassCastException: org.apache.lucene.store.FSIndexOutput
>      [java]     at org.apache.lucene.store.Directory.createFile(Directory.java:67)
>      [java]     at org.apache.jackrabbit.core.query.lucene.FSDirectory.createFile(FSDirectory.java:160)
>      [java]     at org.apache.lucene.store.Directory.createOutput(Directory.java:75)
>      [java]     at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:78)
>      [java]     at org.apache.lucene.index.IndexWriter$1.doBody(IndexWriter.java:263)
>      [java]     at org.apache.lucene.store.Lock$With.run(Lock.java:109)
>      [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:260)
>      [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:245)
>      [java]     at org.apache.jackrabbit.core.query.lucene.AbstractIndex.<init>(AbstractIndex.java:104)
>      [java]     at org.apache.jackrabbit.core.query.lucene.PersistentIndex.<init>(PersistentIndex.java:74)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.getOrCreateIndex(MultiIndex.java:477)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex$CreateIndex.execute(MultiIndex.java:1419)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:763)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.commitVolatileIndex(MultiIndex.java:804)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.checkVolatileCommit(MultiIndex.java:782)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:317)
>      [java]     at org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:279)
>      [java]     at org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:476)
>      [java]     at org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:230)
>      [java]     at org.apache.jackrabbit.core.observation.ObservationManagerFactory.dispatchEvents(ObservationManagerFactory.java:218)
>      [java]     at org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:430)
>      [java]     at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:602)
>      [java]     at org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:692)
>      [java]     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:315)
>      [java]     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:291)
>      [java]     at org.apache.jackrabbit.core.state.SessionItemStateManager.update(SessionItemStateManager.java:257)
>      [java]     at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1189)
>      [java]     at org.apache.jackrabbit.core.SessionImpl.save(SessionImpl.java:805)
>      [java]     at tests.UserFileParsing.fileParsing(UserFileParsing.java:161)
>      [java]     at tests.UserFileParsing.main(UserFileParsing.java:295)
> 
> 
> and
> 
>  DEBUG 08/16/06 05:16:19 [main] (UserFileParsing:157) - 0%
>      [java] Exception in thread "Timer-1" java.lang.ClassCastException: org.apache.lucene.store.FSIndexOutput
>      [java]     at org.apache.lucene.store.Directory.createFile(Directory.java:67)
>      [java]     at org.apache.jackrabbit.core.query.lucene.FSDirectory.createFile(FSDirectory.java:160)
>      [java]     at org.apache.lucene.store.Directory.createOutput(Directory.java:75)
>      [java]     at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:78)
>      [java]     at org.apache.lucene.index.IndexWriter$1.doBody(IndexWriter.java:263)
>      [java]     at org.apache.lucene.store.Lock$With.run(Lock.java:109)
>      [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:260)
>      [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:245)
>      [java]     at org.apache.jackrabbit.core.query.lucene.AbstractIndex.<init>(AbstractIndex.java:104)
>      [java]     at org.apache.jackrabbit.core.query.lucene.PersistentIndex.<init>(PersistentIndex.java:74)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.getOrCreateIndex(MultiIndex.java:477)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex$CreateIndex.execute(MultiIndex.java:1419)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:763)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.commitVolatileIndex(MultiIndex.java:804)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.flush(MultiIndex.java:683)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:929)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.access$000(MultiIndex.java:72)
>      [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex$1.run(MultiIndex.java:283)
>      [java]     at java.util.TimerThread.mainLoop(Timer.java:512)
>      [java]     at java.util.TimerThread.run(Timer.java:462)
> 
> 
> I'm using Lucene 1.9
> 
> Any ideas what's wrong there?
> 

Re: session.save takes about 20 minutes

Posted by Eugeny N Dzhurinsky <bo...@redwerk.com>.
On Fri, May 05, 2006 at 01:43:41PM +0200, Stefan Guggisberg wrote:
> don't use XMLPersistenceManager; you should use jr's default configuration
> (i.e. DerbyPersistenceManager) instead.

okay, i configured repository like this:


<?xml version="1.0"?>
<Repository>
    <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        <param name="path" value="${rep.home}/repository" />
    </FileSystem>

    <Security appName="Jackrabbit">
        <AccessManager class="cms.security.LuceneAccessManagerImpl">
            <param name="file" value="suxx" />

            <param name="test" value="suxx2" />
        </AccessManager>

        <LoginModule class="cms.auth.EasyLoginModule" />
    </Security>

    <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" />

    <Workspace name="${wsp.name}">
        <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${wsp.home}" />
        </FileSystem>

        <PersistenceManager
        class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
            <param name="url" value="jdbc:derby:${wsp.home}/db;create=true" />

            <param name="schemaObjectPrefix" value="${wsp.name}_" />
        </PersistenceManager>
        <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index" />
        </SearchIndex>
    </Workspace>

    <Versioning rootPath="${rep.home}/versions">
        <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${rep.home}/versions" />
        </FileSystem>
        <PersistenceManager
        class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
            <param name="url" value="jdbc:derby:${rep.home}/version/db;create=true" />
            <param name="schemaObjectPrefix" value="version_" />
        </PersistenceManager>
    </Versioning>
</Repository>


And sometimes it throws an exception:
 [java] ERROR 08/11/06 05:11:55 [main] (ObservationManagerFactory:220) - Synchronous EventConsumer threw exception.
     [java] java.lang.ClassCastException: org.apache.lucene.store.FSIndexOutput
     [java]     at org.apache.lucene.store.Directory.createFile(Directory.java:67)
     [java]     at org.apache.jackrabbit.core.query.lucene.FSDirectory.createFile(FSDirectory.java:160)
     [java]     at org.apache.lucene.store.Directory.createOutput(Directory.java:75)
     [java]     at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:78)
     [java]     at org.apache.lucene.index.IndexWriter$1.doBody(IndexWriter.java:263)
     [java]     at org.apache.lucene.store.Lock$With.run(Lock.java:109)
     [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:260)
     [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:245)
     [java]     at org.apache.jackrabbit.core.query.lucene.AbstractIndex.<init>(AbstractIndex.java:104)
     [java]     at org.apache.jackrabbit.core.query.lucene.PersistentIndex.<init>(PersistentIndex.java:74)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.getOrCreateIndex(MultiIndex.java:477)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex$CreateIndex.execute(MultiIndex.java:1419)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:763)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.commitVolatileIndex(MultiIndex.java:804)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.checkVolatileCommit(MultiIndex.java:782)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:317)
     [java]     at org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:279)
     [java]     at org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:476)
     [java]     at org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:230)
     [java]     at org.apache.jackrabbit.core.observation.ObservationManagerFactory.dispatchEvents(ObservationManagerFactory.java:218)
     [java]     at org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:430)
     [java]     at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:602)
     [java]     at org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:692)
     [java]     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:315)
     [java]     at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:291)
     [java]     at org.apache.jackrabbit.core.state.SessionItemStateManager.update(SessionItemStateManager.java:257)
     [java]     at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1189)
     [java]     at org.apache.jackrabbit.core.SessionImpl.save(SessionImpl.java:805)
     [java]     at tests.UserFileParsing.fileParsing(UserFileParsing.java:161)
     [java]     at tests.UserFileParsing.main(UserFileParsing.java:295)


and

 DEBUG 08/16/06 05:16:19 [main] (UserFileParsing:157) - 0%
     [java] Exception in thread "Timer-1" java.lang.ClassCastException: org.apache.lucene.store.FSIndexOutput
     [java]     at org.apache.lucene.store.Directory.createFile(Directory.java:67)
     [java]     at org.apache.jackrabbit.core.query.lucene.FSDirectory.createFile(FSDirectory.java:160)
     [java]     at org.apache.lucene.store.Directory.createOutput(Directory.java:75)
     [java]     at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:78)
     [java]     at org.apache.lucene.index.IndexWriter$1.doBody(IndexWriter.java:263)
     [java]     at org.apache.lucene.store.Lock$With.run(Lock.java:109)
     [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:260)
     [java]     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:245)
     [java]     at org.apache.jackrabbit.core.query.lucene.AbstractIndex.<init>(AbstractIndex.java:104)
     [java]     at org.apache.jackrabbit.core.query.lucene.PersistentIndex.<init>(PersistentIndex.java:74)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.getOrCreateIndex(MultiIndex.java:477)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex$CreateIndex.execute(MultiIndex.java:1419)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:763)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.commitVolatileIndex(MultiIndex.java:804)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.flush(MultiIndex.java:683)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:929)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex.access$000(MultiIndex.java:72)
     [java]     at org.apache.jackrabbit.core.query.lucene.MultiIndex$1.run(MultiIndex.java:283)
     [java]     at java.util.TimerThread.mainLoop(Timer.java:512)
     [java]     at java.util.TimerThread.run(Timer.java:462)


I'm using Lucene 1.9

Any ideas what's wrong there?

-- 
Eugene N Dzhurinsky

Re: session.save takes about 20 minutes

Posted by Eugeny N Dzhurinsky <bo...@redwerk.com>.
On Wed, May 10, 2006 at 02:12:19PM +0200, Stefan Guggisberg wrote:
> >I have almost same results with DerbyPersistenceManager for now. It takes
> >about 15-20 minutes to flush... Any ideas?
> some guesses:
> - did you start with an empty repository? note that modifying the 
> <Workspace>
>  element in repsitory.xml does not affect existing workspace.xml files.
> - is your jvm heap size appropriate?
> - how do you import those nodes? can you provide a test case?

I just think - may be there is some another way to add node with set of
properties rather than add node first and then add properties one by one?

-- 
Eugene N Dzhurinsky

Re: session.save takes about 20 minutes

Posted by Stefan Guggisberg <st...@gmail.com>.
there are a couple of issues with your test code; to name just a few:

- you're obviously measuring the time to execute the main method;
  this includes parsing and iterating repeatedly of a large xml file.
  i don't know xom but i assume that it is similar to jdom; now
  building an entire dom tree in memory might be convenient for
  the programmer but it is certainly not the most efficient way of
  handling large xml data...
- the code that you provided has lots of room for improvement ;)

  just an example:
  return meta.getAttributeValue("value").trim().replaceAll(":",
                       "!").replaceAll("'", "!").replaceAll("\\/", "!")
                       .replaceAll("\"", "!").replaceAll("\\*", "!");
- you're calling session.save() every time you've added 10 nodes;
  save() is an expensive operation; calling save() e.g. every 1000 nodes
  is much more efficient
- you're doing *lots* of extensive string operations and dom tree traversals...


i'd guess that 99.9% of your claimed 20 minutes is spent in xml parsing,
string operations, etc etc, *not* in repositroy write operations.

the following quick test run on my local machine (2.8 ghz pentium)
took about 60s for adding 10k nodes saved in batches of 1000:

                parent = root.addNode("foo", "nt:unstructured");
                long t0 = System.currentTimeMillis();
                for (int i = 1; i <= 10000; i++) {
                    parent.addNode("foo" + i);
                    if (i % 1000 == 0) {
                        root.save();
                        long t1 = System.currentTimeMillis();
                        System.out.println("adding 1000 nodes took " +
(t1 - t0) + "ms");
                        t0 = System.currentTimeMillis();
                    }
                }


On 5/10/06, Eugeny N Dzhurinsky <bo...@redwerk.com> wrote:
> On Wed, May 10, 2006 at 02:12:19PM +0200, Stefan Guggisberg wrote:
> > >I have almost same results with DerbyPersistenceManager for now. It takes
> > >about 15-20 minutes to flush... Any ideas?
> >
> > some guesses:
> > - did you start with an empty repository? note that modifying the
> > <Workspace>
> >  element in repsitory.xml does not affect existing workspace.xml files.
>
> I removing entire repository directory contents
>
> > - is your jvm heap size appropriate?
>
> -Xms128m -Xmx512m
>
> > - how do you import those nodes? can you provide a test case?
>
> Well, it's not a true test case, but it should give some imagination. We are
> parsing some large XML file (~ 20 megabytes) and adding nodes to repository.
>
>
> package tests;
>
> import java.util.HashMap;
> import java.util.Iterator;
> import java.util.LinkedList;
> import java.util.List;
> import java.util.Stack;
>
> import javax.jcr.Node;
> import javax.jcr.Repository;
> import javax.jcr.Session;
> import javax.jcr.SimpleCredentials;
>
> import nu.xom.Builder;
> import nu.xom.Document;
> import nu.xom.Element;
> import nu.xom.Elements;
>
> import org.apache.jackrabbit.core.NodeImpl;
> import org.apache.jackrabbit.uuid.UUID;
> import org.apache.log4j.Logger;
>
> import cms.auth.FileJRGroupBuilder;
> import cms.auth.JRGroup;
> import cms.helper.ConfigurationHelper;
> import cms.helper.ResourceCreationHelper;
> import cms.security.NodeACL;
> import cms.security.NodeACLFactory;
> import cms.security.TransientRepositoryFactory;
>
> public class UserFileParsing {
>
>     private List groups;
>
>     private Stack uuids;
>
>     private int last_lenght;
>
>     private Node root_node;
>
>     private static List unuser_fields;
>
>     private int record_count;
>
>     Logger log = Logger.getLogger(UserFileParsing.class.getName());
>
>     static {
>         unuser_fields = new LinkedList();
>         unuser_fields.add("objectname");
>     }
>
>     class IdStructure {
>         private String UUID;
>
>         private String code;
>
>         public IdStructure(String UUID, String code) {
>             this.UUID = UUID;
>             this.code = code;
>         }
>
>         public String getCode() {
>             return code;
>         }
>
>         public String getUUID() {
>             return UUID;
>         }
>
>         public String toString() {
>             return "[" + UUID + ":" + code + "]";
>         }
>
>     };
>
>     public UserFileParsing() throws Exception {
>         groups = new LinkedList();
>         uuids = new Stack();
>         last_lenght = 0;
>         record_count = 0;
>     }
>
>     private void insertGroups() {
>
>         FileJRGroupBuilder builder = new FileJRGroupBuilder();
>         Iterator it = groups.iterator();
>         while (it.hasNext()) {
>             String gr_name = (String) it.next();
>             if (gr_name.length() > 0)
>                 try {
>                     builder.addGroup(gr_name);
>                     // Thread.sleep(1);
>                 } catch (Exception e) {
>                     log.error(e, e);
>                 }
>
>         }
>         builder.commit();
>     }
>
>     /**
>      * Parses XML file and stores datab into repository
>      * @param filename
>      *            name of file to parse
>      */
>     private void fileParsing(String filename) throws Exception {
>         Builder bld = new Builder();
>         Document document = bld.build(filename);
>         Element rootElement = document.getRootElement(); // treedump
>         Elements tree = rootElement.getFirstChildElement("ds")
>                 .getFirstChildElement("tree").getChildElements();
>         for (int i = 0; i < tree.size(); i++) {
>             Element node = tree.get(i);
>             extractGroups(node);
>             record_count++;
>         }
>
>         if (log.isDebugEnabled())
>             log.debug("NUMBER OF NODES: " + record_count);
>         insertGroups();
>         // list of nodes ---
>         System.setProperty("java.security.cms.auth.login.config",
>                 "conf/jaas.config");
>         ConfigurationHelper ch = new ConfigurationHelper(ResourceCreationHelper
>                 .getResourcePath(UserFileParsing.class, "/security.properties",
>                         true));
>         String CONFIG_FILE = ch.getRepositoryCfgFile();
>         String DIRECTORY = ch.getRepositoryDir();
>         // Set up a Jackrabbit repository with the specified
>         // configuration file and repository directory
>         Repository repository = TransientRepositoryFactory.getInstance(
>                 CONFIG_FILE, DIRECTORY);
>         String username = "username";
>         String password = "password";
>         // Login to the default workspace as a dummy user
>         Session session = repository.login(new SimpleCredentials(username,
>                 password.toCharArray()));
>         root_node = session.getRootNode();
>         root_node.addMixin("mix:referenceable");
>         int curr_record = 0;
>         int flush_step = 10;
>         int curr_flush = 10;
>
>         NodeACLFactory xmlf = NodeACLFactory.getInstance();
>         xmlf.setSystem(true);
>         for (int i = 0; i < tree.size(); i++) {
>             Element node = tree.get(i);
>             putNode(node, xmlf);
>             curr_record++;
>             if (log.isDebugEnabled())
>                 log.debug(curr_record * 100 / record_count + "%");
>             if (curr_record * 100 / record_count > curr_flush) {
>                 if (log.isDebugEnabled())
>                     log.debug("flushing ....");
>                 curr_flush += flush_step;
>                 session.save();
>             }
>
>         }
>         session.save();
>         session.logout();
>     }
>
>     /**
>      * Retreives node name
>      * @param el
>      *            XML node object
>      * @return name of the node
>      */
>     private String getName(Element el) {
>         Elements els = el.getChildElements();
>         for (int i = 0; i < els.size(); i++) {
>             Element meta = els.get(i);
>             if (meta.getLocalName().toLowerCase().equals("meta")
>                     && meta.getAttributeValue("name").toLowerCase().equals(
>                             "objectname"))
>                 return meta.getAttributeValue("value").trim().replaceAll(":",
>                         "!").replaceAll("'", "!").replaceAll("\\/", "!")
>                         .replaceAll("\"", "!").replaceAll("\\*", "!");
>         }
>         return "NULL";
>     }
>
>     /**
>      * @return location of given node
>      * @param el
>      *            node XML object
>      */
>     private String getLocation(Element el) {
>         Elements els = el.getChildElements();
>         for (int i = 0; i < els.size(); i++) {
>             Element meta = els.get(i);
>             if (meta.getLocalName().toLowerCase().equals("meta")
>                     && meta.getAttributeValue("name").toLowerCase().equals(
>                             "nodelocation"))
>                 return meta.getAttributeValue("value");
>         }
>         return "NULL";
>
>     }
>
>     /**
>      * Adds attributes to a node
>      * @param node
>      * @param el
>      *            XML element we need extract parameters from
>      */
>     private void putAttributes(Node node, Element el) throws Exception {
>         Elements els = el.getChildElements();
>         for (int i = 0; i < els.size(); i++) {
>             Element meta = els.get(i);
>             if (meta.getLocalName().toLowerCase().equals("meta")
>                     && !unuser_fields.contains(meta.getAttributeValue("name")
>                             .toLowerCase())) {
>                 if (log.isDebugEnabled())
>                     log.debug("Added property "
>                             + meta.getAttributeValue("name"));
>                 node.setProperty(meta.getAttributeValue("name"), meta
>                         .getAttributeValue("value"));
>             }
>         }
>     }
>
>     /**
>      * Parses privileges from given string
>      * @param priv
>      *            privileges scring
>      * @return list of privileges
>      */
>     private List parsePriv(String priv) {
>         List ret = new LinkedList();
>         String[] privil = priv.split(" ");
>         for (int i = 0; i < privil.length; i++)
>             if (privil[i].length() > 0)
>                 ret.add(privil[i]);
>         return ret;
>     }
>
>     /**
>      * Stores ACLs for given node
>      * @param node
>      *            XML node object
>      * @param node_uuid
>      *            UUID of the node
>      * @param parent_uuid
>      *            UUID of parent node
>      * @param factory
>      *            node ACL factory to be used
>      */
>     private void putACLs(Element node, UUID node_uuid, UUID parent_uuid,
>             NodeACLFactory factory) throws Exception {
>         JRGroup jr_group = new FileJRGroupBuilder().getInstance();
>
>         Element el = node.getFirstChildElement("rights");
>         Elements els = el.getChildElements();
>         HashMap acls = new HashMap();
>
>         for (int i = 0; i < els.size(); i++) {
>             Element right = els.get(i);
>             if (right.getLocalName().equals("right")
>                     && right.getAttributeValue("groupname").length() > 0) {
>                 int scan = -1;
>                 int read = -1;
>                 int write = -1;
>                 int add = -1;
>                 int delete = -1;
>                 List privileges = parsePriv(right.getAttributeValue("rights"));
>                 if (privileges.contains("scan"))
>                     scan = 1;
>                 if (privileges.contains("read"))
>                     read = 1;
>                 if (privileges.contains("write"))
>                     write = 1;
>                 if (privileges.contains("add"))
>                     add = 1;
>                 if (privileges.contains("delete"))
>                     delete = 1;
>                 acls.put(jr_group.getGroupByName(
>                         right.getAttributeValue("groupname")).getId(),
>                         new NodeACL(scan, read, write, delete, add, 1, 1));
>             }
>         }
>         // add all privileges for default group
>         acls.put(jr_group.getGroupByName("default_group").getId(), new NodeACL(
>                 1, 1, 1, 1, 1, 1, 1));
>         factory.addACL(parent_uuid, node_uuid, acls);
>     }
>
>     /**
>      * Stores node into repository
>      * @param node
>      *            XML node object
>      * @param factory
>      *            ACL factory to be used
>      */
>     private void putNode(Element node, NodeACLFactory factory) throws Exception {
>
>         Elements els = node.getChildElements();
>         Element last_version = null;
>
>         for (int i = 0; i < els.size(); i++) {
>             Element version_right = els.get(i);
>             if (version_right.getLocalName().equals("version"))
>                 last_version = els.get(i);
>         }
>         Node parent_node = root_node;
>         String location = getLocation(last_version);
>         if (!uuids.isEmpty()) {
>
>             int times = (last_lenght - location.length()) / 4 + 1;
>             for (int i = 0; i < times; i++)
>                 uuids.pop();
>             last_lenght = location.length();
>             IdStructure struct = (IdStructure) uuids.peek();
>             parent_node = root_node.getNode(struct.getUUID().substring(1,
>                     struct.getUUID().length()));
>         }
>         Node new_node = parent_node.addNode(getName(last_version));
>         new_node.addMixin("mix:referenceable");
>         // Node new_node = parent_node.addNode(getName(last_version));
>         // long start_time = System.currentTimeMillis();
>         if (log.isDebugEnabled())
>             log.debug("Saving ACLs for node");
>         putACLs(node, ((NodeImpl) new_node).internalGetUUID(),
>                 ((NodeImpl) parent_node).internalGetUUID(), factory);
>
>         // log.debug("TIME: " + (System.currentTimeMillis() - start_time));
>         if (log.isDebugEnabled())
>             log.debug("Saving attributes for node");
>         putAttributes(new_node, last_version);
>         uuids.push(new IdStructure(new_node.getPath(), location));
>     }
>
>     /**
>      * Extracts groups from node and adds it to global list of groups
>      * @param node
>      *            XML node object
>      */
>     private void extractGroups(Element node) {
>         Elements rights = node.getFirstChildElement("rights")
>                 .getChildElements();
>         for (int i = 0; i < rights.size(); i++) {
>             Element rig = rights.get(i);
>             String group_name = rig.getAttributeValue("groupname");
>             if (!groups.contains(group_name))
>                 groups.add(group_name);
>         }
>     }
>
>     public static void main(String[] args) throws Exception {
>         UserFileParsing parsing = new UserFileParsing();
>         parsing.fileParsing("conf/sampledata.xml");
>     }
> }
>
> --
> Eugene N Dzhurinsky
>

Re: session.save takes about 20 minutes

Posted by Eugeny N Dzhurinsky <bo...@redwerk.com>.
On Wed, May 10, 2006 at 02:12:19PM +0200, Stefan Guggisberg wrote:
> >I have almost same results with DerbyPersistenceManager for now. It takes
> >about 15-20 minutes to flush... Any ideas?
> 
> some guesses:
> - did you start with an empty repository? note that modifying the 
> <Workspace>
>  element in repsitory.xml does not affect existing workspace.xml files.

I removing entire repository directory contents

> - is your jvm heap size appropriate?

-Xms128m -Xmx512m

> - how do you import those nodes? can you provide a test case?

Well, it's not a true test case, but it should give some imagination. We are
parsing some large XML file (~ 20 megabytes) and adding nodes to repository.


package tests;

import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Stack;

import javax.jcr.Node;
import javax.jcr.Repository;
import javax.jcr.Session;
import javax.jcr.SimpleCredentials;

import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.Element;
import nu.xom.Elements;

import org.apache.jackrabbit.core.NodeImpl;
import org.apache.jackrabbit.uuid.UUID;
import org.apache.log4j.Logger;

import cms.auth.FileJRGroupBuilder;
import cms.auth.JRGroup;
import cms.helper.ConfigurationHelper;
import cms.helper.ResourceCreationHelper;
import cms.security.NodeACL;
import cms.security.NodeACLFactory;
import cms.security.TransientRepositoryFactory;

public class UserFileParsing {

    private List groups;

    private Stack uuids;

    private int last_lenght;

    private Node root_node;

    private static List unuser_fields;

    private int record_count;

    Logger log = Logger.getLogger(UserFileParsing.class.getName());

    static {
        unuser_fields = new LinkedList();
        unuser_fields.add("objectname");
    }

    class IdStructure {
        private String UUID;

        private String code;

        public IdStructure(String UUID, String code) {
            this.UUID = UUID;
            this.code = code;
        }

        public String getCode() {
            return code;
        }

        public String getUUID() {
            return UUID;
        }

        public String toString() {
            return "[" + UUID + ":" + code + "]";
        }

    };

    public UserFileParsing() throws Exception {
        groups = new LinkedList();
        uuids = new Stack();
        last_lenght = 0;
        record_count = 0;
    }

    private void insertGroups() {

        FileJRGroupBuilder builder = new FileJRGroupBuilder();
        Iterator it = groups.iterator();
        while (it.hasNext()) {
            String gr_name = (String) it.next();
            if (gr_name.length() > 0)
                try {
                    builder.addGroup(gr_name);
                    // Thread.sleep(1);
                } catch (Exception e) {
                    log.error(e, e);
                }

        }
        builder.commit();
    }

    /**
     * Parses XML file and stores datab into repository
     * @param filename
     *            name of file to parse
     */
    private void fileParsing(String filename) throws Exception {
        Builder bld = new Builder();
        Document document = bld.build(filename);
        Element rootElement = document.getRootElement(); // treedump
        Elements tree = rootElement.getFirstChildElement("ds")
                .getFirstChildElement("tree").getChildElements();
        for (int i = 0; i < tree.size(); i++) {
            Element node = tree.get(i);
            extractGroups(node);
            record_count++;
        }

        if (log.isDebugEnabled())
            log.debug("NUMBER OF NODES: " + record_count);
        insertGroups();
        // list of nodes ---
        System.setProperty("java.security.cms.auth.login.config",
                "conf/jaas.config");
        ConfigurationHelper ch = new ConfigurationHelper(ResourceCreationHelper
                .getResourcePath(UserFileParsing.class, "/security.properties",
                        true));
        String CONFIG_FILE = ch.getRepositoryCfgFile();
        String DIRECTORY = ch.getRepositoryDir();
        // Set up a Jackrabbit repository with the specified
        // configuration file and repository directory
        Repository repository = TransientRepositoryFactory.getInstance(
                CONFIG_FILE, DIRECTORY);
        String username = "username";
        String password = "password";
        // Login to the default workspace as a dummy user
        Session session = repository.login(new SimpleCredentials(username,
                password.toCharArray()));
        root_node = session.getRootNode();
        root_node.addMixin("mix:referenceable");
        int curr_record = 0;
        int flush_step = 10;
        int curr_flush = 10;

        NodeACLFactory xmlf = NodeACLFactory.getInstance();
        xmlf.setSystem(true);
        for (int i = 0; i < tree.size(); i++) {
            Element node = tree.get(i);
            putNode(node, xmlf);
            curr_record++;
            if (log.isDebugEnabled())
                log.debug(curr_record * 100 / record_count + "%");
            if (curr_record * 100 / record_count > curr_flush) {
                if (log.isDebugEnabled())
                    log.debug("flushing ....");
                curr_flush += flush_step;
                session.save();
            }

        }
        session.save();
        session.logout();
    }

    /**
     * Retreives node name
     * @param el
     *            XML node object
     * @return name of the node
     */
    private String getName(Element el) {
        Elements els = el.getChildElements();
        for (int i = 0; i < els.size(); i++) {
            Element meta = els.get(i);
            if (meta.getLocalName().toLowerCase().equals("meta")
                    && meta.getAttributeValue("name").toLowerCase().equals(
                            "objectname"))
                return meta.getAttributeValue("value").trim().replaceAll(":",
                        "!").replaceAll("'", "!").replaceAll("\\/", "!")
                        .replaceAll("\"", "!").replaceAll("\\*", "!");
        }
        return "NULL";
    }

    /**
     * @return location of given node
     * @param el
     *            node XML object
     */
    private String getLocation(Element el) {
        Elements els = el.getChildElements();
        for (int i = 0; i < els.size(); i++) {
            Element meta = els.get(i);
            if (meta.getLocalName().toLowerCase().equals("meta")
                    && meta.getAttributeValue("name").toLowerCase().equals(
                            "nodelocation"))
                return meta.getAttributeValue("value");
        }
        return "NULL";

    }

    /**
     * Adds attributes to a node
     * @param node
     * @param el
     *            XML element we need extract parameters from
     */
    private void putAttributes(Node node, Element el) throws Exception {
        Elements els = el.getChildElements();
        for (int i = 0; i < els.size(); i++) {
            Element meta = els.get(i);
            if (meta.getLocalName().toLowerCase().equals("meta")
                    && !unuser_fields.contains(meta.getAttributeValue("name")
                            .toLowerCase())) {
                if (log.isDebugEnabled())
                    log.debug("Added property "
                            + meta.getAttributeValue("name"));
                node.setProperty(meta.getAttributeValue("name"), meta
                        .getAttributeValue("value"));
            }
        }
    }

    /**
     * Parses privileges from given string
     * @param priv
     *            privileges scring
     * @return list of privileges
     */
    private List parsePriv(String priv) {
        List ret = new LinkedList();
        String[] privil = priv.split(" ");
        for (int i = 0; i < privil.length; i++)
            if (privil[i].length() > 0)
                ret.add(privil[i]);
        return ret;
    }

    /**
     * Stores ACLs for given node
     * @param node
     *            XML node object
     * @param node_uuid
     *            UUID of the node
     * @param parent_uuid
     *            UUID of parent node
     * @param factory
     *            node ACL factory to be used
     */
    private void putACLs(Element node, UUID node_uuid, UUID parent_uuid,
            NodeACLFactory factory) throws Exception {
        JRGroup jr_group = new FileJRGroupBuilder().getInstance();

        Element el = node.getFirstChildElement("rights");
        Elements els = el.getChildElements();
        HashMap acls = new HashMap();

        for (int i = 0; i < els.size(); i++) {
            Element right = els.get(i);
            if (right.getLocalName().equals("right")
                    && right.getAttributeValue("groupname").length() > 0) {
                int scan = -1;
                int read = -1;
                int write = -1;
                int add = -1;
                int delete = -1;
                List privileges = parsePriv(right.getAttributeValue("rights"));
                if (privileges.contains("scan"))
                    scan = 1;
                if (privileges.contains("read"))
                    read = 1;
                if (privileges.contains("write"))
                    write = 1;
                if (privileges.contains("add"))
                    add = 1;
                if (privileges.contains("delete"))
                    delete = 1;
                acls.put(jr_group.getGroupByName(
                        right.getAttributeValue("groupname")).getId(),
                        new NodeACL(scan, read, write, delete, add, 1, 1));
            }
        }
        // add all privileges for default group
        acls.put(jr_group.getGroupByName("default_group").getId(), new NodeACL(
                1, 1, 1, 1, 1, 1, 1));
        factory.addACL(parent_uuid, node_uuid, acls);
    }

    /**
     * Stores node into repository
     * @param node
     *            XML node object
     * @param factory
     *            ACL factory to be used
     */
    private void putNode(Element node, NodeACLFactory factory) throws Exception {

        Elements els = node.getChildElements();
        Element last_version = null;

        for (int i = 0; i < els.size(); i++) {
            Element version_right = els.get(i);
            if (version_right.getLocalName().equals("version"))
                last_version = els.get(i);
        }
        Node parent_node = root_node;
        String location = getLocation(last_version);
        if (!uuids.isEmpty()) {

            int times = (last_lenght - location.length()) / 4 + 1;
            for (int i = 0; i < times; i++)
                uuids.pop();
            last_lenght = location.length();
            IdStructure struct = (IdStructure) uuids.peek();
            parent_node = root_node.getNode(struct.getUUID().substring(1,
                    struct.getUUID().length()));
        }
        Node new_node = parent_node.addNode(getName(last_version));
        new_node.addMixin("mix:referenceable");
        // Node new_node = parent_node.addNode(getName(last_version));
        // long start_time = System.currentTimeMillis();
        if (log.isDebugEnabled())
            log.debug("Saving ACLs for node");
        putACLs(node, ((NodeImpl) new_node).internalGetUUID(),
                ((NodeImpl) parent_node).internalGetUUID(), factory);

        // log.debug("TIME: " + (System.currentTimeMillis() - start_time));
        if (log.isDebugEnabled())
            log.debug("Saving attributes for node");
        putAttributes(new_node, last_version);
        uuids.push(new IdStructure(new_node.getPath(), location));
    }

    /**
     * Extracts groups from node and adds it to global list of groups
     * @param node
     *            XML node object
     */
    private void extractGroups(Element node) {
        Elements rights = node.getFirstChildElement("rights")
                .getChildElements();
        for (int i = 0; i < rights.size(); i++) {
            Element rig = rights.get(i);
            String group_name = rig.getAttributeValue("groupname");
            if (!groups.contains(group_name))
                groups.add(group_name);
        }
    }

    public static void main(String[] args) throws Exception {
        UserFileParsing parsing = new UserFileParsing();
        parsing.fileParsing("conf/sampledata.xml");
    }
}

-- 
Eugene N Dzhurinsky

Re: session.save takes about 20 minutes

Posted by Stefan Guggisberg <st...@gmail.com>.
On 5/10/06, Eugeny N Dzhurinsky <bo...@redwerk.com> wrote:
> On Fri, May 05, 2006 at 01:43:41PM +0200, Stefan Guggisberg wrote:
> > On 5/5/06, Eugeny N Dzhurinsky <bo...@redwerk.com> wrote:
> > >Hi there!
> > >We are facing some strange issues: there are about 4000 nodes we are
> > >trying to
> > >save into repository with
> > >org.apache.jackrabbit.core.fs.local.LocalFileSystem
> > >and org.apache.jackrabbit.core.state.xml.XMLPersistenceManager
> > >
> > >for some reason it takes about 20 minutes to save session (we are doing
> > >session.save after we imported each ~400 nodes).
> > >
> > >Any ides how to speed up this?
> >
> > don't use XMLPersistenceManager; you should use jr's default configuration
> > (i.e. DerbyPersistenceManager) instead.
>
> I have almost same results with DerbyPersistenceManager for now. It takes
> about 15-20 minutes to flush... Any ideas?

some guesses:
- did you start with an empty repository? note that modifying the <Workspace>
  element in repsitory.xml does not affect existing workspace.xml files.
- is your jvm heap size appropriate?
- how do you import those nodes? can you provide a test case?

>
> Here is my repository config below:
>
> <?xml version="1.0"?>
> <Repository>
>     <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>         <param name="path" value="${rep.home}/repository" />
>     </FileSystem>
>
>     <Security appName="Jackrabbit">
>         <AccessManager class="cms.security.LuceneAccessManagerImpl"/>
>         <LoginModule class="cms.auth.EasyLoginModule" />
>     </Security>
>
>     <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" />
>
>     <Workspace name="${wsp.name}">
>         <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>             <param name="path" value="${wsp.home}" />
>         </FileSystem>
>
>         <PersistenceManager
>         class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
>             <param name="url" value="jdbc:derby:${wsp.home}/db;create=true" />
>
>             <param name="schemaObjectPrefix" value="${wsp.name}_" />
>         </PersistenceManager>
>         <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>             <param name="path" value="${wsp.home}/index" />
>         </SearchIndex>
>     </Workspace>
>
>     <Versioning rootPath="${rep.home}/versions">
>         <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>             <param name="path" value="${rep.home}/versions" />
>         </FileSystem>
>         <PersistenceManager
>         class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
>             <param name="url" value="jdbc:derby:${rep.home}/version/db;create=true" />
>             <param name="schemaObjectPrefix" value="version_" />
>         </PersistenceManager>
>     </Versioning>
> </Repository>
>
>
> --
> Eugene N Dzhurinsky
>

Re: session.save takes about 20 minutes

Posted by Eugeny N Dzhurinsky <bo...@redwerk.com>.
On Fri, May 05, 2006 at 01:43:41PM +0200, Stefan Guggisberg wrote:
> On 5/5/06, Eugeny N Dzhurinsky <bo...@redwerk.com> wrote:
> >Hi there!
> >We are facing some strange issues: there are about 4000 nodes we are 
> >trying to
> >save into repository with 
> >org.apache.jackrabbit.core.fs.local.LocalFileSystem
> >and org.apache.jackrabbit.core.state.xml.XMLPersistenceManager
> >
> >for some reason it takes about 20 minutes to save session (we are doing
> >session.save after we imported each ~400 nodes).
> >
> >Any ides how to speed up this?
> 
> don't use XMLPersistenceManager; you should use jr's default configuration
> (i.e. DerbyPersistenceManager) instead.

I have almost same results with DerbyPersistenceManager for now. It takes
about 15-20 minutes to flush... Any ideas?

Here is my repository config below:

<?xml version="1.0"?>
<Repository>
    <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        <param name="path" value="${rep.home}/repository" />
    </FileSystem>

    <Security appName="Jackrabbit">
        <AccessManager class="cms.security.LuceneAccessManagerImpl"/>
        <LoginModule class="cms.auth.EasyLoginModule" />
    </Security>

    <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" />

    <Workspace name="${wsp.name}">
        <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${wsp.home}" />
        </FileSystem>

        <PersistenceManager
        class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
            <param name="url" value="jdbc:derby:${wsp.home}/db;create=true" />

            <param name="schemaObjectPrefix" value="${wsp.name}_" />
        </PersistenceManager>
        <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index" />
        </SearchIndex>
    </Workspace>

    <Versioning rootPath="${rep.home}/versions">
        <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${rep.home}/versions" />
        </FileSystem>
        <PersistenceManager
        class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
            <param name="url" value="jdbc:derby:${rep.home}/version/db;create=true" />
            <param name="schemaObjectPrefix" value="version_" />
        </PersistenceManager>
    </Versioning>
</Repository>


-- 
Eugene N Dzhurinsky

Re: session.save takes about 20 minutes

Posted by Stefan Guggisberg <st...@gmail.com>.
On 5/5/06, Eugeny N Dzhurinsky <bo...@redwerk.com> wrote:
> Hi there!
> We are facing some strange issues: there are about 4000 nodes we are trying to
> save into repository with org.apache.jackrabbit.core.fs.local.LocalFileSystem
> and org.apache.jackrabbit.core.state.xml.XMLPersistenceManager
>
> for some reason it takes about 20 minutes to save session (we are doing
> session.save after we imported each ~400 nodes).
>
> Any ides how to speed up this?

don't use XMLPersistenceManager; you should use jr's default configuration
(i.e. DerbyPersistenceManager) instead.

cheers
stefan

> --
> Eugene N Dzhurinsky
>