You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jérôme BENOIS <be...@argia-engineering.fr> on 2006/01/06 17:07:00 UTC

Newbies question

Hello,

	I want use jackrabbit in order to create 180000 content nodes. But, how
to use correctly the session ? Call session.save() for each nodes or
prefer call session.save() per bloc of 1000 nodes ? 

Bests Regards,
Jérôme.

Re: Newbies question

Posted by Stefan Guggisberg <st...@gmail.com>.
On 1/18/06, Jérôme BENOIS <be...@argia-engineering.fr> wrote:
> Hi,
>
>         Thanks for your response.
>
>         I replaced my PM by DerbyPM and an error occured when i run my test :
>         INFO main fr.openmodel.cms.imports.process.Impl_ImportMgtProcess -
> inserted count = 18500/50000
> ERROR main
> org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager - failed
> to write property state: f08c7241-032f-4515-a0cd-c8592649b45e

note that the exception msg is wrong, i fixed it. the problem occured while
storing node references. you seem to have a *lot* of reference properties
refering to the same target node.i suggest you either modify your data model
or you modify the schema of the DerbyPersistenceManager (see the
derby.ddl file for an example). you can e.g. change the following line

create table ${schemaObjectPrefix}REFS (NODE_ID char(36) not null,
REFS_DATA blob not null)

to

create table ${schemaObjectPrefix}REFS (NODE_ID char(36) not null,
REFS_DATA blob(10M) not null)

cheers
stefan

Re: Newbies question

Posted by Jérôme BENOIS <be...@argia-engineering.fr>.
Hi,

	Thanks for your response.

	I replaced my PM by DerbyPM and an error occured when i run my test :
	INFO main fr.openmodel.cms.imports.process.Impl_ImportMgtProcess -
inserted count = 18500/50000
ERROR main
org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager - failed
to write property state: f08c7241-032f-4515-a0cd-c8592649b45e
ERROR 22001: A truncation error was encountered trying to shrink BLOB
'XX-RESOLVE-XX' to length 1048576.
        at
org.apache.derby.iapi.error.StandardException.newException(Unknown
Source)
        at
org.apache.derby.iapi.types.SQLBinary.checkHostVariable(Unknown Source)
        at
org.apache.derby.exe.ac597e80e5x0108xdd95x2506x0000001861b8c.e3(Unknown
Source)
        at
org.apache.derby.impl.services.reflect.DirectCall.invoke(Unknown Source)
        at
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.doProjection(Unknown Source)
        at
org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(Unknown Source)
        at
org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown Source)
        at
org.apache.derby.impl.sql.execute.DMLWriteResultSet.getNextRowCore(Unknown Source)
        at
org.apache.derby.impl.sql.execute.UpdateResultSet.collectAffectedRows(Unknown Source)
        at
org.apache.derby.impl.sql.execute.UpdateResultSet.open(Unknown Source)
        at
org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown
Source)
        at
org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown
Source)
        at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source)
        at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown
Source)
        at
org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager.store(SimpleDbPersistenceManager.java:779)
        at
org.apache.jackrabbit.core.state.AbstractPersistenceManager.store(AbstractPersistenceManager.java:87)
        at
org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager.store(SimpleDbPersistenceManager.java:446)
        at org.apache.jackrabbit.core.state.SharedItemStateManager
$Update.end(SharedItemStateManager.java:562)
        at
org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:680)
        at
org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:322)
        at
org.apache.jackrabbit.core.state.XAItemStateManager.update(XAItemStateManager.java:322)
        at
org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:298)
        at
org.apache.jackrabbit.core.state.SessionItemStateManager.update(SessionItemStateManager.java:260)
        at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1190)
        at
org.apache.jackrabbit.core.SessionImpl.save(SessionImpl.java:758)
        at
fr.openmodel.cms.contentunit.entities.Impl_ContentUnitEntityMgr.saveCUs(Impl_ContentUnitEntityMgr.java:709)

	And i tested with SimpleDBPM and postgres8 database. I created 50000
nodes with 3 properties in 25 minutes and when i launch this simple
query ... it's very long : 

DEBUG main org.apache.jackrabbit.core.query.lucene.Recovery - RedoLog is
empty, no recovery needed.
INFO main org.apache.jackrabbit.core.query.lucene.SearchIndex - Index
initialized: /opt/jackrabbit/repotest/workspaces/default/index
DEBUG main org.apache.jackrabbit.core.query.lucene.QueryImpl - Executing
query: 
+ Root node
+ Select properties: *
  + PathQueryNode
    + LocationStepQueryNode:  NodeTest={} Descendants=false Index=NONE
    + LocationStepQueryNode:  NodeTest=* Descendants=true Index=NONE
      + AndQueryNode
        + NodeTypeQueryNode:
Prop={http://www.jcp.org/jcr/1.0}primaryType
Value={http://www.jcp.org/jcr/nt/1.0}unstructured
        + AndQueryNode
          + RelationQueryNode: Op: LIKE Prop={}email Type=STRING Value=%
jean%

INFO main org.apache.jackrabbit.core.query.lucene.DocNumberCache -
size=64/1024, #accesses=1001, #hits=937, #misses=64, cacheRatio=94%
--------> nb result 1563 in 25.0 seconds

And if i test when another query, it's very good :

DEBUG main org.apache.jackrabbit.core.query.lucene.QueryImpl - Executing
query: 
+ Root node
+ Select properties: *
  + PathQueryNode
    + LocationStepQueryNode:  NodeTest={} Descendants=false Index=NONE
    + LocationStepQueryNode:  NodeTest=* Descendants=true Index=NONE
      + AndQueryNode
        + NodeTypeQueryNode:
Prop={http://www.jcp.org/jcr/1.0}primaryType
Value={http://www.jcp.org/jcr/nt/1.0}unstructured
        + AndQueryNode
          + RelationQueryNode: Op: LIKE Prop={}email Type=STRING Value=%
wanadoo%

--------> nb result 2 in 0.0 seconds


the query time depend of number of results, all results are loadred in
memory ??

Regards,
Jérôme.


Le mercredi 18 janvier 2006 à 10:26 +0100, Stefan Guggisberg a écrit :
> is there a specific reason for using ObjectPersistenceManager?
> 
> if you use jr's 'default' persistence manager (DerbyPersistenceManager)
> you should be experiencing a much better performance.
> 
> there's a sample configuration in svn:
> jackrabbit/src/main/config/repository.xml
> 
> cheers
> stefan
> 
> On 1/18/06, Jérôme BENOIS <be...@argia-engineering.fr> wrote:
> > Hi All,
> >
> >         Thanks for your response.
> >
> >         I carried out some tests with 50000 nodes (small nodes with 3
> > properties), i create this in 25 minutes and my store weigh 2.5Go.
> >
> >         And when i execute a simple query is still long : ~5 minutes.
> >
> >         I applied your suggestion about document order here :
> > <?xml version="1.0" encoding="ISO-8859-1"?>
> > <Repository>
> >     <FileSystem
> > class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
> >         <param name="path" value="${rep.home}/repository"/>
> >                 <param name="persistent" value="true"/>
> >     </FileSystem>
> >     <Security appName="Jackrabbit">
> >         <AccessManager
> > class="org.apache.jackrabbit.core.security.SimpleAccessManager"/>
> >         <LoginModule
> > class="org.apache.jackrabbit.core.security.SimpleLoginModule">
> >            <param name="anonymousId" value="anonymous"/>
> >         </LoginModule>
> >     </Security>
> >     <Workspaces rootPath="${rep.home}/workspaces"
> > defaultWorkspace="default"/>
> >             <Workspace name="${wsp.name}">
> >                         <FileSystem
> > class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
> >                         <param name="path" value="${wsp.home}"/>
> >                 </FileSystem>
> >                     <PersistenceManager
> > class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
> >             <SearchIndex
> > class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
> >                     <param name="path"
> > value="${wsp.home}/index"/>
> >                     <param name="autoRepair" value="false"/>
> >                     <param name="respectDocumentOrder" value="false"/>
> >                 </SearchIndex>
> >             </Workspace>
> >             <Versioning rootPath="${rep.home}/version">
> >                         <FileSystem
> > class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
> >                                 <param name="path" value="${rep.home}/version"/>
> >                         </FileSystem>
> >                 <PersistenceManager
> > class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
> >                 </Versioning>
> > </Repository>
> >
> >         And i use suversion version, i activated debug mode when i launched my
> > simple query :
> >
> > DEBUG main org.apache.jackrabbit.core.query.lucene.QueryImpl - Executing
> > query:
> > + Root node
> > + Select properties: *
> >   + PathQueryNode
> >     + LocationStepQueryNode:  NodeTest={} Descendants=false Index=NONE
> >     + LocationStepQueryNode:  NodeTest=* Descendants=true Index=NONE
> >       + AndQueryNode
> >         + NodeTypeQueryNode:
> > Prop={http://www.jcp.org/jcr/1.0}primaryType
> > Value={http://www.jcp.org/jcr/nt/1.0}unstructured
> >         + AndQueryNode
> >           + RelationQueryNode: Op: LIKE Prop={}email Type=STRING Value=a
> > %
> >
> > DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
> > merging segments _0 (1 docs) into _1 (1 docs)
> > DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
> > closing IndexWriter.
> > INFO main org.apache.jackrabbit.core.query.lucene.DocNumberCache -
> > size=60/1024, #accesses=1001, #hits=941, #misses=60, cacheRatio=95%
> > DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> > Flushing index after being idle for 3615 ms.
> > DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > index added: name=_ii, numDocs=1
> > DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> > Committed in-memory index in 2ms.
> > DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
> > - merging segments _0 (8416 docs) into _1 (8416 docs)
> > INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > merged 8416 documents in 4206 ms into _ih.
> > DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > replace indexes
> > DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
> > - closing IndexWriter.
> > DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > index added: name=_ih, numDocs=8416
> > DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> > Flushing index after being idle for 3339 ms.
> > DEBUG main
> > org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl - 1537
> > node(s) ordered in 331882 ms
> > INFO main fr.openmodel.cms.imports.process.TestImportMgtProcess -
> > testInsert 3 contentUnits.size()=1537
> > INFO Thread-4
> > org.apache.jackrabbit.core.observation.ObservationManagerFactory -
> > Notification of EventListeners stopped.
> > DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > dispose IndexMerger
> > INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > IndexMerger terminated
> > DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > quit sent
> > DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > IndexMerger thread stopped
> > DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> > merge queue size: 0
> > INFO Thread-4 org.apache.jackrabbit.core.query.lucene.SearchIndex -
> > Index closed: /opt/jackrabbit/repotest/workspaces/default/index
> > DEBUG Thread-4
> > org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> > EventListener: org.apache.jackrabbit.core.lock.LockManagerImpl@149a794
> > DEBUG Thread-4
> > org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> > EventListener: org.apache.jackrabbit.core.SearchManager@64023c
> > DEBUG Thread-4
> > org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> > EventListener: org.apache.jackrabbit.core.RepositoryImpl@190a0d6
> >
> >
> > Can you help me please ?
> >
> > Thanks for your help,
> >
> > Best Regards,
> > Jérôme.
> >
> >
> > Le jeudi 12 janvier 2006 à 12:08 +0100, Marcel Reutegger a écrit :
> > > try disabling document order on query results:
> > >
> > > <SearchIndex
> > > class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
> > >      [...]
> > >      <param name="respectDocumentOrder" value="false"/>
> > >
> > > </SearchIndex>
> > >
> > >
> > > information about document order is not stored in the index, that mean
> > > if you have a large result set, the query handler has to load nodes from
> > > storage, which is expensive compared to index lookups.
> > >
> > > regards
> > >   marcel
> >
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2 (GNU/Linux)
> >
> > iD8DBQBDzYJACDKlmlhpwnERAiXUAJ4rYhsCx3S6bYnTfX7fTDHrow76FQCfZsCZ
> > W38VEVjp89GuF6cgWQICP/I=
> > =hvyf
> > -----END PGP SIGNATURE-----
> >
> >
> >
> 

Re: Newbies question

Posted by Stefan Guggisberg <st...@gmail.com>.
is there a specific reason for using ObjectPersistenceManager?

if you use jr's 'default' persistence manager (DerbyPersistenceManager)
you should be experiencing a much better performance.

there's a sample configuration in svn:
jackrabbit/src/main/config/repository.xml

cheers
stefan

On 1/18/06, Jérôme BENOIS <be...@argia-engineering.fr> wrote:
> Hi All,
>
>         Thanks for your response.
>
>         I carried out some tests with 50000 nodes (small nodes with 3
> properties), i create this in 25 minutes and my store weigh 2.5Go.
>
>         And when i execute a simple query is still long : ~5 minutes.
>
>         I applied your suggestion about document order here :
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <Repository>
>     <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>         <param name="path" value="${rep.home}/repository"/>
>                 <param name="persistent" value="true"/>
>     </FileSystem>
>     <Security appName="Jackrabbit">
>         <AccessManager
> class="org.apache.jackrabbit.core.security.SimpleAccessManager"/>
>         <LoginModule
> class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>            <param name="anonymousId" value="anonymous"/>
>         </LoginModule>
>     </Security>
>     <Workspaces rootPath="${rep.home}/workspaces"
> defaultWorkspace="default"/>
>             <Workspace name="${wsp.name}">
>                         <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>                         <param name="path" value="${wsp.home}"/>
>                 </FileSystem>
>                     <PersistenceManager
> class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
>             <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>                     <param name="path"
> value="${wsp.home}/index"/>
>                     <param name="autoRepair" value="false"/>
>                     <param name="respectDocumentOrder" value="false"/>
>                 </SearchIndex>
>             </Workspace>
>             <Versioning rootPath="${rep.home}/version">
>                         <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>                                 <param name="path" value="${rep.home}/version"/>
>                         </FileSystem>
>                 <PersistenceManager
> class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
>                 </Versioning>
> </Repository>
>
>         And i use suversion version, i activated debug mode when i launched my
> simple query :
>
> DEBUG main org.apache.jackrabbit.core.query.lucene.QueryImpl - Executing
> query:
> + Root node
> + Select properties: *
>   + PathQueryNode
>     + LocationStepQueryNode:  NodeTest={} Descendants=false Index=NONE
>     + LocationStepQueryNode:  NodeTest=* Descendants=true Index=NONE
>       + AndQueryNode
>         + NodeTypeQueryNode:
> Prop={http://www.jcp.org/jcr/1.0}primaryType
> Value={http://www.jcp.org/jcr/nt/1.0}unstructured
>         + AndQueryNode
>           + RelationQueryNode: Op: LIKE Prop={}email Type=STRING Value=a
> %
>
> DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
> merging segments _0 (1 docs) into _1 (1 docs)
> DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
> closing IndexWriter.
> INFO main org.apache.jackrabbit.core.query.lucene.DocNumberCache -
> size=60/1024, #accesses=1001, #hits=941, #misses=60, cacheRatio=95%
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> Flushing index after being idle for 3615 ms.
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> index added: name=_ii, numDocs=1
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> Committed in-memory index in 2ms.
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
> - merging segments _0 (8416 docs) into _1 (8416 docs)
> INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> merged 8416 documents in 4206 ms into _ih.
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> replace indexes
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
> - closing IndexWriter.
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> index added: name=_ih, numDocs=8416
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> Flushing index after being idle for 3339 ms.
> DEBUG main
> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl - 1537
> node(s) ordered in 331882 ms
> INFO main fr.openmodel.cms.imports.process.TestImportMgtProcess -
> testInsert 3 contentUnits.size()=1537
> INFO Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerFactory -
> Notification of EventListeners stopped.
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> dispose IndexMerger
> INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> IndexMerger terminated
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> quit sent
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> IndexMerger thread stopped
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> merge queue size: 0
> INFO Thread-4 org.apache.jackrabbit.core.query.lucene.SearchIndex -
> Index closed: /opt/jackrabbit/repotest/workspaces/default/index
> DEBUG Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> EventListener: org.apache.jackrabbit.core.lock.LockManagerImpl@149a794
> DEBUG Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> EventListener: org.apache.jackrabbit.core.SearchManager@64023c
> DEBUG Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> EventListener: org.apache.jackrabbit.core.RepositoryImpl@190a0d6
>
>
> Can you help me please ?
>
> Thanks for your help,
>
> Best Regards,
> Jérôme.
>
>
> Le jeudi 12 janvier 2006 à 12:08 +0100, Marcel Reutegger a écrit :
> > try disabling document order on query results:
> >
> > <SearchIndex
> > class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
> >      [...]
> >      <param name="respectDocumentOrder" value="false"/>
> >
> > </SearchIndex>
> >
> >
> > information about document order is not stored in the index, that mean
> > if you have a large result set, the query handler has to load nodes from
> > storage, which is expensive compared to index lookups.
> >
> > regards
> >   marcel
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2 (GNU/Linux)
>
> iD8DBQBDzYJACDKlmlhpwnERAiXUAJ4rYhsCx3S6bYnTfX7fTDHrow76FQCfZsCZ
> W38VEVjp89GuF6cgWQICP/I=
> =hvyf
> -----END PGP SIGNATURE-----
>
>
>

Re: Newbies question

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Jérôme,

the logs indicate that jackrabbit still orders results in document order:

org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl - 1537
node(s) ordered in 331882 ms

I assume you forgot to also apply the configuration change to any 
existing workspace.xml.

keep in mind that the repository.xml file has two purposes:
- configure repository wide services like security, versioning, etc.
- provide a *template* configuration for new workspaces

if you change the workspace section in repository.xml you only modify 
the behaviour for newly created workspaces.

as an alternative you can also add an 'order by' clause to your query:

//*[jcr:contains(., 'foo')] order by jcr:score descending

this will force jackrabbit to order result nodes by relevance, instead 
of expensive document order.

regards
  marcel


Jérôme BENOIS wrote:
> Hi All,
> 
> 	Thanks for your response.
> 	
> 	I carried out some tests with 50000 nodes (small nodes with 3
> properties), i create this in 25 minutes and my store weigh 2.5Go.
> 
> 	And when i execute a simple query is still long : ~5 minutes.
> 
> 	I applied your suggestion about document order here : 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <Repository>
>     <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>         <param name="path" value="${rep.home}/repository"/>
> 		<param name="persistent" value="true"/>
>     </FileSystem>    
>     <Security appName="Jackrabbit">
>         <AccessManager
> class="org.apache.jackrabbit.core.security.SimpleAccessManager"/>
>         <LoginModule
> class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>            <param name="anonymousId" value="anonymous"/>
>         </LoginModule>
>     </Security>
>     <Workspaces rootPath="${rep.home}/workspaces"
> defaultWorkspace="default"/>
> 	    <Workspace name="${wsp.name}">
>    			<FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>             		<param name="path" value="${wsp.home}"/>
> 	        </FileSystem>        
>     		    <PersistenceManager
> class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
>             <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
> 	            <param name="path"
> value="${wsp.home}/index"/>	                       	            
> 	            <param name="autoRepair" value="false"/>
> 	            <param name="respectDocumentOrder" value="false"/>
> 	        </SearchIndex>	        
> 	    </Workspace>
> 	    <Versioning rootPath="${rep.home}/version">
>     			<FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>     				<param name="path" value="${rep.home}/version"/>
> 	    		</FileSystem>          
> 	        <PersistenceManager
> class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>                        
>     		</Versioning>
> </Repository>
> 
> 	And i use suversion version, i activated debug mode when i launched my
> simple query : 
> 
> DEBUG main org.apache.jackrabbit.core.query.lucene.QueryImpl - Executing
> query: 
> + Root node
> + Select properties: *
>   + PathQueryNode
>     + LocationStepQueryNode:  NodeTest={} Descendants=false Index=NONE
>     + LocationStepQueryNode:  NodeTest=* Descendants=true Index=NONE
>       + AndQueryNode
>         + NodeTypeQueryNode:
> Prop={http://www.jcp.org/jcr/1.0}primaryType
> Value={http://www.jcp.org/jcr/nt/1.0}unstructured
>         + AndQueryNode
>           + RelationQueryNode: Op: LIKE Prop={}email Type=STRING Value=a
> %
> 
> DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
> merging segments _0 (1 docs) into _1 (1 docs)
> DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
> closing IndexWriter.
> INFO main org.apache.jackrabbit.core.query.lucene.DocNumberCache -
> size=60/1024, #accesses=1001, #hits=941, #misses=60, cacheRatio=95%
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> Flushing index after being idle for 3615 ms.
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> index added: name=_ii, numDocs=1
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> Committed in-memory index in 2ms.
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
> - merging segments _0 (8416 docs) into _1 (8416 docs)
> INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> merged 8416 documents in 4206 ms into _ih.
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> replace indexes
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
> - closing IndexWriter.
> DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> index added: name=_ih, numDocs=8416
> DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
> Flushing index after being idle for 3339 ms.
> DEBUG main
> org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl - 1537
> node(s) ordered in 331882 ms
> INFO main fr.openmodel.cms.imports.process.TestImportMgtProcess -
> testInsert 3 contentUnits.size()=1537
> INFO Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerFactory -
> Notification of EventListeners stopped.
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> dispose IndexMerger
> INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
> IndexMerger terminated
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> quit sent
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> IndexMerger thread stopped
> DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
> merge queue size: 0
> INFO Thread-4 org.apache.jackrabbit.core.query.lucene.SearchIndex -
> Index closed: /opt/jackrabbit/repotest/workspaces/default/index
> DEBUG Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> EventListener: org.apache.jackrabbit.core.lock.LockManagerImpl@149a794
> DEBUG Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> EventListener: org.apache.jackrabbit.core.SearchManager@64023c
> DEBUG Thread-4
> org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
> EventListener: org.apache.jackrabbit.core.RepositoryImpl@190a0d6
> 
> 
> Can you help me please ?
> 
> Thanks for your help,
> 
> Best Regards,
> Jérôme.
> 
> 
> Le jeudi 12 janvier 2006 à 12:08 +0100, Marcel Reutegger a écrit :
> 
>>try disabling document order on query results:
>>
>><SearchIndex
>>class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>     [...]
>>     <param name="respectDocumentOrder" value="false"/>
>>
>></SearchIndex>
>>
>>
>>information about document order is not stored in the index, that mean 
>>if you have a large result set, the query handler has to load nodes from 
>>storage, which is expensive compared to index lookups.
>>
>>regards
>>  marcel


Re: Newbies question

Posted by Jérôme BENOIS <be...@argia-engineering.fr>.
Hi All,

	Thanks for your response.
	
	I carried out some tests with 50000 nodes (small nodes with 3
properties), i create this in 25 minutes and my store weigh 2.5Go.

	And when i execute a simple query is still long : ~5 minutes.

	I applied your suggestion about document order here : 
<?xml version="1.0" encoding="ISO-8859-1"?>
<Repository>
    <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        <param name="path" value="${rep.home}/repository"/>
		<param name="persistent" value="true"/>
    </FileSystem>    
    <Security appName="Jackrabbit">
        <AccessManager
class="org.apache.jackrabbit.core.security.SimpleAccessManager"/>
        <LoginModule
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
           <param name="anonymousId" value="anonymous"/>
        </LoginModule>
    </Security>
    <Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="default"/>
	    <Workspace name="${wsp.name}">
   			<FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            		<param name="path" value="${wsp.home}"/>
	        </FileSystem>        
    		    <PersistenceManager
class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
            <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
	            <param name="path"
value="${wsp.home}/index"/>	                       	            
	            <param name="autoRepair" value="false"/>
	            <param name="respectDocumentOrder" value="false"/>
	        </SearchIndex>	        
	    </Workspace>
	    <Versioning rootPath="${rep.home}/version">
    			<FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
    				<param name="path" value="${rep.home}/version"/>
	    		</FileSystem>          
	        <PersistenceManager
class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>                        
    		</Versioning>
</Repository>

	And i use suversion version, i activated debug mode when i launched my
simple query : 

DEBUG main org.apache.jackrabbit.core.query.lucene.QueryImpl - Executing
query: 
+ Root node
+ Select properties: *
  + PathQueryNode
    + LocationStepQueryNode:  NodeTest={} Descendants=false Index=NONE
    + LocationStepQueryNode:  NodeTest=* Descendants=true Index=NONE
      + AndQueryNode
        + NodeTypeQueryNode:
Prop={http://www.jcp.org/jcr/1.0}primaryType
Value={http://www.jcp.org/jcr/nt/1.0}unstructured
        + AndQueryNode
          + RelationQueryNode: Op: LIKE Prop={}email Type=STRING Value=a
%

DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
merging segments _0 (1 docs) into _1 (1 docs)
DEBUG main org.apache.jackrabbit.core.query.lucene.AbstractIndex -
closing IndexWriter.
INFO main org.apache.jackrabbit.core.query.lucene.DocNumberCache -
size=60/1024, #accesses=1001, #hits=941, #misses=60, cacheRatio=95%
DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
Flushing index after being idle for 3615 ms.
DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.IndexMerger -
index added: name=_ii, numDocs=1
DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
Committed in-memory index in 2ms.
DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
- merging segments _0 (8416 docs) into _1 (8416 docs)
INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
merged 8416 documents in 4206 ms into _ih.
DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
replace indexes
DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.AbstractIndex
- closing IndexWriter.
DEBUG IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
index added: name=_ih, numDocs=8416
DEBUG Timer-2 org.apache.jackrabbit.core.query.lucene.MultiIndex -
Flushing index after being idle for 3339 ms.
DEBUG main
org.apache.jackrabbit.core.query.lucene.DocOrderNodeIteratorImpl - 1537
node(s) ordered in 331882 ms
INFO main fr.openmodel.cms.imports.process.TestImportMgtProcess -
testInsert 3 contentUnits.size()=1537
INFO Thread-4
org.apache.jackrabbit.core.observation.ObservationManagerFactory -
Notification of EventListeners stopped.
DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
dispose IndexMerger
INFO IndexMerger org.apache.jackrabbit.core.query.lucene.IndexMerger -
IndexMerger terminated
DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
quit sent
DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
IndexMerger thread stopped
DEBUG Thread-4 org.apache.jackrabbit.core.query.lucene.IndexMerger -
merge queue size: 0
INFO Thread-4 org.apache.jackrabbit.core.query.lucene.SearchIndex -
Index closed: /opt/jackrabbit/repotest/workspaces/default/index
DEBUG Thread-4
org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
EventListener: org.apache.jackrabbit.core.lock.LockManagerImpl@149a794
DEBUG Thread-4
org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
EventListener: org.apache.jackrabbit.core.SearchManager@64023c
DEBUG Thread-4
org.apache.jackrabbit.core.observation.ObservationManagerImpl - removing
EventListener: org.apache.jackrabbit.core.RepositoryImpl@190a0d6


Can you help me please ?

Thanks for your help,

Best Regards,
Jérôme.


Le jeudi 12 janvier 2006 à 12:08 +0100, Marcel Reutegger a écrit :
> try disabling document order on query results:
> 
> <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>      [...]
>      <param name="respectDocumentOrder" value="false"/>
> 
> </SearchIndex>
> 
> 
> information about document order is not stored in the index, that mean 
> if you have a large result set, the query handler has to load nodes from 
> storage, which is expensive compared to index lookups.
> 
> regards
>   marcel

Re: Newbies question

Posted by Marcel Reutegger <ma...@gmx.net>.
try disabling document order on query results:

<SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
     [...]
     <param name="respectDocumentOrder" value="false"/>

</SearchIndex>


information about document order is not stored in the index, that mean 
if you have a large result set, the query handler has to load nodes from 
storage, which is expensive compared to index lookups.

regards
  marcel


Jérôme BENOIS wrote:
> Hi,
> 
> 	Thanks for your response.
> 
> 	I carried out some tests with my 180000 nodes, when i save per block of
> 100 nodes the execution time is 8 hours and my store is very big : 1.3Go
> 
> 	When i launch a simple query, my CPU, RAM is full and execution is very
> very long ...
> 
> 	Could you help me please ?
> 
> My repository.xml here : 
> 
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <Repository>
>     <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>         <param name="path" value="${rep.home}/repository"/>
>     </FileSystem>
>     <Security appName="Jackrabbit">
>         <AccessManager
> class="org.apache.jackrabbit.core.security.SimpleAccessManager">
>         </AccessManager>
>         <LoginModule
> class="org.apache.jackrabbit.core.security.SimpleLoginModule">
>            <param name="anonymousId" value="anonymous"/>
>         </LoginModule>
>     </Security>
>     <Workspaces rootPath="${rep.home}/workspaces"
> defaultWorkspace="default"/>
>     <Workspace name="${wsp.name}">
>         <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>             <param name="path" value="${wsp.home}"/>
>         </FileSystem>
>         <PersistenceManager
> class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>       
>         <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>             <param name="path" value="${wsp.home}/index"/>
>             <param name="useCompoundFile" value="true"/>
>             <param name="minMergeDocs" value="100"/>
>             <param name="volatileIdleTime" value="3"/>
>             <param name="maxMergeDocs" value="100000"/>
>             <param name="mergeFactor" value="10"/>
>             <param name="bufferSize" value="10"/>
>             <param name="cacheSize" value="1000"/>
>             <param name="forceConsistencyCheck" value="false"/>
>             <param name="autoRepair" value="true"/>
>         </SearchIndex>
>     </Workspace>
>     <Versioning rootPath="${rep.home}/version">
>         <FileSystem
> class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
>             <param name="path" value="${rep.home}/version"/>
>         </FileSystem>        
>         <PersistenceManager
> class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
>     </Versioning>
> </Repository>
> 
> Best Regards,
> Jérôme.
> 
> Le vendredi 06 janvier 2006 à 19:17 +0200, Jukka Zitting a écrit :
> 


Re: Newbies question

Posted by Jérôme BENOIS <be...@argia-engineering.fr>.
Hi,

	Thanks for your response.

	I carried out some tests with my 180000 nodes, when i save per block of
100 nodes the execution time is 8 hours and my store is very big : 1.3Go

	When i launch a simple query, my CPU, RAM is full and execution is very
very long ...

	Could you help me please ?

My repository.xml here : 


<?xml version="1.0" encoding="ISO-8859-1"?>
<Repository>
    <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
        <param name="path" value="${rep.home}/repository"/>
    </FileSystem>
    <Security appName="Jackrabbit">
        <AccessManager
class="org.apache.jackrabbit.core.security.SimpleAccessManager">
        </AccessManager>
        <LoginModule
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
           <param name="anonymousId" value="anonymous"/>
        </LoginModule>
    </Security>
    <Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="default"/>
    <Workspace name="${wsp.name}">
        <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${wsp.home}"/>
        </FileSystem>
        <PersistenceManager
class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>       
        <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="useCompoundFile" value="true"/>
            <param name="minMergeDocs" value="100"/>
            <param name="volatileIdleTime" value="3"/>
            <param name="maxMergeDocs" value="100000"/>
            <param name="mergeFactor" value="10"/>
            <param name="bufferSize" value="10"/>
            <param name="cacheSize" value="1000"/>
            <param name="forceConsistencyCheck" value="false"/>
            <param name="autoRepair" value="true"/>
        </SearchIndex>
    </Workspace>
    <Versioning rootPath="${rep.home}/version">
        <FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${rep.home}/version"/>
        </FileSystem>        
        <PersistenceManager
class="org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager"/>
    </Versioning>
</Repository>

Best Regards,
Jérôme.

Le vendredi 06 janvier 2006 à 19:17 +0200, Jukka Zitting a écrit :
> Hi,
> 
> On 1/6/06, Jérôme BENOIS <be...@argia-engineering.fr> wrote:
> > I want use jackrabbit in order to create 180000 content nodes. But, how
> > to use correctly the session ? Call session.save() for each nodes or
> > prefer call session.save() per bloc of 1000 nodes ?
> 
> It depends on your performance and memory use requirements. Each
> save() will cost you some time, but the more changes you queue up
> before calling save() the more memory your process will use to hold
> the pending changes. Calling Session.save() only per a block of
> changes is probably better for such bulk loads.
> 
> You may also want to take a look at the Workspace.importXML() and
> Workspace.getImportContentHandler() for an efficient alternative to
> bulk loading large amounts of data.
> 
> BR,
> 
> Jukka Zitting
> 
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftmanship, JCR consulting, and Java development

Re: Newbies question

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 1/6/06, Jérôme BENOIS <be...@argia-engineering.fr> wrote:
> I want use jackrabbit in order to create 180000 content nodes. But, how
> to use correctly the session ? Call session.save() for each nodes or
> prefer call session.save() per bloc of 1000 nodes ?

It depends on your performance and memory use requirements. Each
save() will cost you some time, but the more changes you queue up
before calling save() the more memory your process will use to hold
the pending changes. Calling Session.save() only per a block of
changes is probably better for such bulk loads.

You may also want to take a look at the Workspace.importXML() and
Workspace.getImportContentHandler() for an efficient alternative to
bulk loading large amounts of data.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftmanship, JCR consulting, and Java development