You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Paco Avila <pa...@git.es> on 2006/11/29 20:18:42 UTC

modify a node type definition

I have created a repository with several hundred of nodes and now I want
to add a "mix:referenceable" to all these nodes. The node actually is
defined as:

[okm:document] > nt:hierarchyNode, mix:lockable, mix:accessControlled
- okm:author (string) mandatory
- okm:name (string)
- okm:language (string)
- okm:keywords (string)
+ okm:content (okm:resource) primary mandatory

and I want to become:

[okm:document] > nt:hierarchyNode, mix:referenceable, mix:lockable,
mix:accessControlled
- okm:author (string) mandatory
- okm:name (string)
- okm:language (string)
- okm:keywords (string)
+ okm:content (okm:resource) primary mandatory

I'have try 

NodeTypeRegistry.reregisterNodeType(NodeTypeDef)

but seems to be "not yet implemented" :(
-- 
Paco Avila <pa...@git.es>


Re: modify a node type definition

Posted by Paco Avila <pa...@git.es>.
El jue, 30-11-2006 a las 11:50 +0200, Jukka Zitting escribió:
> Hi,
> 
> On 11/30/06, Paco Avila <pa...@git.es> wrote:
> > El jue, 30-11-2006 a las 09:55 +0100, Stefan Guggisberg escribió:
> > > if you add mix:referenceable to an existing node type, existing nodes
> > > of that node type would be in an inconsistent state as they don't
> > > have the jcr:uuid property. that's why you can't add it.
> >
> > But I can add this "mix:referenceable" to all the existing node, isn't
> > it? So the stats will not be inconsistent.
> 
> The problem with applying the change to the node type rather than
> individual nodes is that the node type modification implies a global
> modification of all nodes of that type. We currently don't have the
> functionality in place to achieve such large-scale modifications. Thus
> the "not yet implemented" message.

Ok, I didn't understand the message.

> I would very much like to see us eventually getting to a point where
> we can implement such operations, but that will probably not happen in
> the next few releases since the amount of required effort is
> considerable. Most likely we'll see a sequence of incremental steps to
> support ever larger node type modifications.

O, I see. The other option is create a new repository with the
"mix:referencable" in the node definition al migrate all the nodes. But
I'll lose the history :(
-- 
Paco Avila <pa...@git.es>


Re: modify a node type definition

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 12/1/06, Michael Neale <mi...@gmail.com> wrote:
> When you say "don't use versioning" - can I use versioning in the "source"
> repo, but dump it out without the history?

Yes. If you don't need the version histories in the target repository,
then there's no need to avoid versioning.

BR,

Jukka Zitting

Re: modify a node type definition

Posted by Michael Neale <mi...@gmail.com>.
Thanks Jukka.
When you say "don't use versioning" - can I use versioning in the "source"
repo, but dump it out without the history?

On 11/30/06, Jukka Zitting <ju...@gmail.com> wrote:
>
> Hi,
>
> On 11/30/06, Michael Neale <mi...@gmail.com> wrote:
> > So any hints on how best to design things to make "migration" as easy as
> > possible in future? any magic tricks?
>
> Good question. Some rules of thumb:
>
> * When developing your application, use an XML import or a simple
> builder application to set up your test content. This way you can
> easily scrap the entire test repository, make any modifications (node
> type changes, etc.) you need, and then recreate the test repository
> without worrying about migrating existing content.
>
> * Avoid putting things directly below the root node. Use a top-level
> node like /my:root or /my:content and place all your application
> content under that. This way you can easily export/import content
> using the standard XML mappings without having to worry about the
> protected /jcr:system subtree.
>
> * If your have lots of content, then you should structure it so that
> you can export and import it in smaller pieces. If you use references,
> make sure that you can import the parts being referenced before the
> referencing parts.
>
> * Don't use versioning. As of now Jackrabbit doesn't support migrating
> version histories across repositories. This will likely change sooner
> or later, but I wouldn't yet count on it.
>
> There is quite a lot of demand for more comprehensive backup/restore
> and migration tools so at least I'm quite confident that sooner or
> later (hopefully within next year :-) we'll have support for those use
> cases. Nicolas' GSoC project was a good start towards achieving that
> goal.
>
> BR,
>
> Jukka Zitting
>

Re: modify a node type definition

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 11/30/06, Michael Neale <mi...@gmail.com> wrote:
> So any hints on how best to design things to make "migration" as easy as
> possible in future? any magic tricks?

Good question. Some rules of thumb:

* When developing your application, use an XML import or a simple
builder application to set up your test content. This way you can
easily scrap the entire test repository, make any modifications (node
type changes, etc.) you need, and then recreate the test repository
without worrying about migrating existing content.

* Avoid putting things directly below the root node. Use a top-level
node like /my:root or /my:content and place all your application
content under that. This way you can easily export/import content
using the standard XML mappings without having to worry about the
protected /jcr:system subtree.

* If your have lots of content, then you should structure it so that
you can export and import it in smaller pieces. If you use references,
make sure that you can import the parts being referenced before the
referencing parts.

* Don't use versioning. As of now Jackrabbit doesn't support migrating
version histories across repositories. This will likely change sooner
or later, but I wouldn't yet count on it.

There is quite a lot of demand for more comprehensive backup/restore
and migration tools so at least I'm quite confident that sooner or
later (hopefully within next year :-) we'll have support for those use
cases. Nicolas' GSoC project was a good start towards achieving that
goal.

BR,

Jukka Zitting

Re: modify a node type definition

Posted by Michael Neale <mi...@gmail.com>.
So any hints on how best to design things to make "migration" as easy as
possible in future? any magic tricks?

On 11/30/06, Jukka Zitting <ju...@gmail.com> wrote:
>
> Hi,
>
> On 11/30/06, Paco Avila <pa...@git.es> wrote:
> > So, I need to create a new repository with this new node definition and
> > migrate the node info (copy). But I'll lose the node history or can I
> > migrate the history too.
>
> Exactly. There are still no simple tools available for migrating large
> repositories even without version histories, so you'll need to do some
> tinkering on your own to achieve this.
>
> You may also want to check the contrib/backup project that Nicolas has
> been working.
>
> BR,
>
> Jukka Zitting
>

Re: modify a node type definition

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 11/30/06, Paco Avila <pa...@git.es> wrote:
> So, I need to create a new repository with this new node definition and
> migrate the node info (copy). But I'll lose the node history or can I
> migrate the history too.

Exactly. There are still no simple tools available for migrating large
repositories even without version histories, so you'll need to do some
tinkering on your own to achieve this.

You may also want to check the contrib/backup project that Nicolas has
been working.

BR,

Jukka Zitting

Re: modify a node type definition

Posted by Paco Avila <pa...@git.es>.
El jue, 30-11-2006 a las 11:50 +0200, Jukka Zitting escribió:
> Hi,
> 
> On 11/30/06, Paco Avila <pa...@git.es> wrote:
> > El jue, 30-11-2006 a las 09:55 +0100, Stefan Guggisberg escribió:
> > > if you add mix:referenceable to an existing node type, existing nodes
> > > of that node type would be in an inconsistent state as they don't
> > > have the jcr:uuid property. that's why you can't add it.
> >
> > But I can add this "mix:referenceable" to all the existing node, isn't
> > it? So the stats will not be inconsistent.
> 
> The problem with applying the change to the node type rather than
> individual nodes is that the node type modification implies a global
> modification of all nodes of that type. We currently don't have the
> functionality in place to achieve such large-scale modifications. Thus
> the "not yet implemented" message.
> 
> I would very much like to see us eventually getting to a point where
> we can implement such operations, but that will probably not happen in
> the next few releases since the amount of required effort is
> considerable. Most likely we'll see a sequence of incremental steps to
> support ever larger node type modifications.

Ok, thanks.

So, I need to create a new repository with this new node definition and
migrate the node info (copy). But I'll lose the node history or can I
migrate the history too.

Cheers
-- 
Paco Avila <pa...@git.es>


Re: modify a node type definition

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 11/30/06, Paco Avila <pa...@git.es> wrote:
> El jue, 30-11-2006 a las 09:55 +0100, Stefan Guggisberg escribió:
> > if you add mix:referenceable to an existing node type, existing nodes
> > of that node type would be in an inconsistent state as they don't
> > have the jcr:uuid property. that's why you can't add it.
>
> But I can add this "mix:referenceable" to all the existing node, isn't
> it? So the stats will not be inconsistent.

The problem with applying the change to the node type rather than
individual nodes is that the node type modification implies a global
modification of all nodes of that type. We currently don't have the
functionality in place to achieve such large-scale modifications. Thus
the "not yet implemented" message.

I would very much like to see us eventually getting to a point where
we can implement such operations, but that will probably not happen in
the next few releases since the amount of required effort is
considerable. Most likely we'll see a sequence of incremental steps to
support ever larger node type modifications.

BR,

Jukka Zitting

Re: modify a node type definition

Posted by Paco Avila <pa...@git.es>.
El jue, 30-11-2006 a las 09:55 +0100, Stefan Guggisberg escribió:
> On 11/29/06, Paco Avila <pa...@git.es> wrote:
> > I have created a repository with several hundred of nodes and now I want
> > to add a "mix:referenceable" to all these nodes. The node actually is
> > defined as:
> >
> > [okm:document] > nt:hierarchyNode, mix:lockable, mix:accessControlled
> > - okm:author (string) mandatory
> > - okm:name (string)
> > - okm:language (string)
> > - okm:keywords (string)
> > + okm:content (okm:resource) primary mandatory
> >
> > and I want to become:
> >
> > [okm:document] > nt:hierarchyNode, mix:referenceable, mix:lockable,
> > mix:accessControlled
> > - okm:author (string) mandatory
> > - okm:name (string)
> > - okm:language (string)
> > - okm:keywords (string)
> > + okm:content (okm:resource) primary mandatory
> >
> > I'have try
> >
> > NodeTypeRegistry.reregisterNodeType(NodeTypeDef)
> >
> > but seems to be "not yet implemented" :(
> 
> mix:referenceable declares a mandatory property called jcr:uuid.
> 
> if you add mix:referenceable to an existing node type, existing nodes
> of that node type would be in an inconsistent state as they don't
> have the jcr:uuid property. that's why you can't add it.

But I can add this "mix:referenceable" to all the existing node, isn't
it? So the stats will not be inconsistent.
-- 
Paco Avila <pa...@git.es>


Re: modify a node type definition

Posted by Stefan Guggisberg <st...@gmail.com>.
On 11/29/06, Paco Avila <pa...@git.es> wrote:
> I have created a repository with several hundred of nodes and now I want
> to add a "mix:referenceable" to all these nodes. The node actually is
> defined as:
>
> [okm:document] > nt:hierarchyNode, mix:lockable, mix:accessControlled
> - okm:author (string) mandatory
> - okm:name (string)
> - okm:language (string)
> - okm:keywords (string)
> + okm:content (okm:resource) primary mandatory
>
> and I want to become:
>
> [okm:document] > nt:hierarchyNode, mix:referenceable, mix:lockable,
> mix:accessControlled
> - okm:author (string) mandatory
> - okm:name (string)
> - okm:language (string)
> - okm:keywords (string)
> + okm:content (okm:resource) primary mandatory
>
> I'have try
>
> NodeTypeRegistry.reregisterNodeType(NodeTypeDef)
>
> but seems to be "not yet implemented" :(

mix:referenceable declares a mandatory property called jcr:uuid.

if you add mix:referenceable to an existing node type, existing nodes
of that node type would be in an inconsistent state as they don't
have the jcr:uuid property. that's why you can't add it.

cheers
stefan

> --
> Paco Avila <pa...@git.es>
>
>

Re: RowIterator loop is slow?

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Dan,

thanks for the clarification. I think now I understand.

dan wrote:
> An example of my query scenario would be: 
> For my:document nodes that meet the following query conditions: 
>  1) refer to countries US, or Canada, or Mexico and, 
>  2) refer to sizes Small or Medium, and
>  3) refer to colors Red or Yellow, and
>  4) document content contains arbitrary user entered text
>  5) ... other property based query parameters
> Return the country names referred by above document result (in a unique
> list), and count the number of documents under each returned country name. 
> An example of expected result set may be: 
>    US, 19
>    Canada, 4
> 
> You may have noticed that in query condition 1), users are allowed to
> specify target countries, but the result may not have all country names as
> specified (Mexico here), because other document filtering parameters may
> prevent any "Mexico"-referring document from showing in the result.
> 
> I hope this makes things clear for you. 
> My perception is that I can't achieve the result with One query because
> there no "Select distinct" and "inner join" equivalent in JCR/Jackrabbit.  

and you would also need 'group by' and aggregate functions like sum(). 
enhancements like those are currently discussed in JSR 283.

> Would you have any suggestion/comment on the approach? 

I think the best you can do with the current JCR version is:

1) query for all categories (countries) that match your query. That's the SQL 
query you posted initially but converted into an XPath with an additional 
jcr:deref() at the end to the category node.

2) for each matching category run a new query for that category, which will 
return the documents for that category. Then get the number of documents by 
calling NodeIterator.getSize() instead of looping through all the matches. This 
should be faster that your initial approach.

regards
  marcel

RE: RowIterator loop is slow?

Posted by dan <da...@hotmail.com>.
Hi Marcel,

>I'm also a bit confused whether you are finally interested in documents or
>categories. The SQL query you posted earlier indicates that you are
>interested in documents, but the above XPath query indicates that you are
>interested in the referenced category text.

Yes I am (ultimately) interested in the referenced Category text. The query
in previous post is a test to achieve my goal described here. Let me make it
in plain language...

All my:document nodes refer to MULTIPLE "category nodes" - say Country,
Size, and ColorScheme. Each of them has multiple category-entries. 
Say category "Country" has entries "US, Canada, Mexico, Argentina, Brazil";
Category "Size" has entries "Small, Medium, Large"; And Category
"ColorScheme" has entries "Red, Green, Yellow, Pink, Black".

For any my:document node, 
- it points to ONE or more entries of Country
- it points to zero or more entries of Size
- it points to zero or more entries of ColorScheme
- it has other text/date properties

An example of my query scenario would be: 
For my:document nodes that meet the following query conditions: 
 1) refer to countries US, or Canada, or Mexico and, 
 2) refer to sizes Small or Medium, and
 3) refer to colors Red or Yellow, and
 4) document content contains arbitrary user entered text
 5) ... other property based query parameters
Return the country names referred by above document result (in a unique
list), and count the number of documents under each returned country name. 
An example of expected result set may be: 
   US, 19
   Canada, 4

You may have noticed that in query condition 1), users are allowed to
specify target countries, but the result may not have all country names as
specified (Mexico here), because other document filtering parameters may
prevent any "Mexico"-referring document from showing in the result.

I hope this makes things clear for you. 
My perception is that I can't achieve the result with One query because
there no "Select distinct" and "inner join" equivalent in JCR/Jackrabbit.  

Would you have any suggestion/comment on the approach? 

Regards,
Dan





Re: RowIterator loop is slow?

Posted by Marcel Reutegger <ma...@gmx.net>.
dan wrote:
> In my case, my:document nodes refer to MULTIPLE "categories". I need to
> support queries that say: 
> 
> //element(*, my:document)
> [ (jcr:deref(@my:cat1Ref, 'cat1Entry'))    
>        (--- the document must refer to at least one of the 'category1'
>             entries)
> 
>   And (jcr:deref(@my:cat2Ref, 'cat2Entry')/@my:text = 'cat2_Entry_2')
>        (--- the document must also refer to category2 entry, named
>             "cat2_entry_2")
> 
>   And (jcr:deref(@my:cat3Ref, 'cat3Entry')/@my:text = 'cat3_Entry_x')
>        (--- the document must also refer to category3 entry, named
>             "cat3_entry_x")
> 
> ]/jcr:deref(@my:cat1Ref, 'cat1Entry')/@my:text order by @my:text
> 			
> Due to the query limitations, I had to gather all document nodes and
> manually compile the list of referenced category entries.
> 
> Would you advise on some other approaches? 

I'm not sure I understand your requirements correctly. I would simplify the 
query by not using the jcr:deref() in the predicate (well, you can't anyway, 
because it's not supported) but replace it with the uuid of the referenced 
category. I assume that jcr:deref(@my:cat3Ref, 'cat3Entry')/@my:text = 
'cat3_Entry_x' can be easily replaced because it points to a well known node or 
at least limited set of nodes?

I'm also a bit confused whether you are finally interested in documents or 
categories. The SQL query you posted earlier indicates that you are interested 
in documents, but the above XPath query indicates that you are interested in the 
referenced category text.

regards
  marcel

RE: RowIterator loop is slow?

Posted by dan <da...@hotmail.com>.
Hi Marcel,

> > - RowIterator.nextRow(): 13234-9375 = 3859ms
> 
> that indicates that most of the time is spent in retrieving the values
> from persistent storage. Does your application really require to read
> all 2000 documents?

Actually, I was forced to use this approach because I could not find a
proper content structure and/or query syntax to support the client's
requirement.

In my case, my:document nodes refer to MULTIPLE "categories". I need to
support queries that say: 

//element(*, my:document)
[ (jcr:deref(@my:cat1Ref, 'cat1Entry'))    
       (--- the document must refer to at least one of the 'category1'
            entries)

  And (jcr:deref(@my:cat2Ref, 'cat2Entry')/@my:text = 'cat2_Entry_2')
       (--- the document must also refer to category2 entry, named
            "cat2_entry_2")

  And (jcr:deref(@my:cat3Ref, 'cat3Entry')/@my:text = 'cat3_Entry_x')
       (--- the document must also refer to category3 entry, named
            "cat3_entry_x")

]/jcr:deref(@my:cat1Ref, 'cat1Entry')/@my:text order by @my:text
			
Due to the query limitations, I had to gather all document nodes and
manually compile the list of referenced category entries.

Would you advise on some other approaches? 

Thanks,
Dan


Re: RowIterator loop is slow?

Posted by Marcel Reutegger <ma...@gmx.net>.
dan wrote:
> And here is the result (Extracted from the Log4j output):
> - Query.execute(): 9375-7484 = 1891ms

do you see an improvement here when you execute the query again. Initial queries 
can be quite slow because the hierarchy structure is resolved from the index but 
later cached which speeds up queries significantly.

> - QueryResult.getRows(): less than 1 ms
> - RowIterator.nextRow(): 13234-9375 = 3859ms

that indicates that most of the time is spent in retrieving the values from 
persistent storage. Does your application really require to read all 2000 documents?

regards
  marcel

RE: RowIterator loop is slow?

Posted by dan <da...@hotmail.com>.
> Can you give more details where those 3 seconds are spent? I'd be
> interested to

The query is in SQL: 
SELECT Source FROM cm:document WHERE jcr:path LIKE '/cm:contentRoot/CCD/%'
AND cm:state='published' AND  (flag3='2'  OR flag3='1' )   AND
(Source='640'  OR Source='240'  OR Source='220'  OR Source='130'  OR
Source='160'  OR Source='020'  OR Source='630'  OR Source='760'  OR
Source='050'  OR Source='730'  OR Source='190'  OR Source='230'  OR
Source='360'  OR Source='530'  OR Source='040'  OR Source='330'  OR
Source='720'  OR Source='750'  OR Source='390'  OR Source='540'  OR
Source='280'  OR Source='110'  OR Source='580'  OR Source='620' )   AND
(category1='090'  OR Category1='150'  OR Category1='130'  OR Category1='160'
OR Category1='020'  OR Category1='060'  OR Category1='050'  OR
Category1='140'  OR Category1='040'  OR Category1='010'  OR Category1='080'
OR Category1='110'  OR Category1='030'  OR Category1='070'  OR
Category1='100'  OR Category1='120' )   ORDER BY Source

NodeType cm:document has properties "Source", "Category1" whose values equal
to other category nodes' values. (I can't use references because I need to
query on these multiple values to filter documents, which is not supported
in current JCR if node reference is used).

And here is the result (Extracted from the Log4j output):
- Query.execute(): 9375-7484 = 1891ms
- QueryResult.getRows(): less than 1 ms
- RowIterator.nextRow(): 13234-9375 = 3859ms

***********************************
[2006-12-01 10:32,  7484]DEBUG[WebContainer : 1] executing SQL query for
categories: SELECT Source FROM cm:document WHERE jcr:path LIKE
'/cm:contentRoot/CCD/%'   AND cm:state='published' AND  (flag3='2'  OR
flag3='1' )   AND  (Source='640'  OR Source='240'  OR Source='220'  OR
Source='130'  OR Source='160'  OR Source='020'  OR Source='630'  OR
Source='760'  OR Source='050'  OR Source='730'  OR Source='190'  OR
Source='230'  OR Source='360'  OR Source='530'  OR Source='040'  OR
Source='330'  OR Source='720'  OR Source='750'  OR Source='390'  OR
Source='540'  OR Source='280'  OR Source='110'  OR Source='580'  OR
Source='620' )   AND  (Category1='090'  OR Category1='150'  OR
Category1='130'  OR Category1='160'  OR Category1='020'  OR Category1='060'
OR Category1='050'  OR Category1='140'  OR Category1='040'  OR
Category1='010'  OR Category1='080'  OR Category1='110'  OR Category1='030'
OR Category1='070'  OR Category1='100'  OR Category1='120' )   ORDER BY
Source
[2006-12-01 10:32,  9375]DEBUG[WebContainer : 1] query successful, getting
RowIterator
[2006-12-01 10:32,  9375]DEBUG[WebContainer : 1] got iterator, total number
of entries: 2092
[2006-12-01 10:32,  9375]DEBUG[WebContainer : 1] start parsing categories..
[2006-12-01 10:32, 13234]DEBUG[WebContainer : 1] end processing categories
************************************

Thanks,
Dan


Re: RowIterator loop is slow?

Posted by Marcel Reutegger <ma...@gmx.net>.
dan wrote:
> IMO, "order by NOT_JUST_jcr:score" is very common use case. The way that
> retrieving all nodes from multiple BLOBs into Java objects and then do Java
> sorting, won't have any performance advantage over that allowing RDB to
> handle everything in one shot. 

The expensive sorting is only done when document order is requested. order by 
jcr:score() was just an example. If you order by any other property lucene will 
do the sorting as well, just like ordering by score.

Can you give more details where those 3 seconds are spent? I'd be interested to 
know how much time is spent in:

- Query.execute()
- QueryResult.getRows()
- RowIterator.nextRow()

regards
  marcel

RE: RowIterator loop is slow?

Posted by dan <da...@hotmail.com>.
Thanks Marcel,

> if your query does not have an 'order by' clause AND the query handler 
> configuration uses the default value for the
> 'respectDocumentOrder' parameter. In that case there is a post processing
> in the query result which orders the result nodes in document order.

That explains why it was slow for me - I have "order by @my:propname" in the
query, although I've already set "respectDocumentOrder" to false.

Now I feel somewhat agree with the idea in theads a few days back, about
"expanding RDB schema" and "if using RDB repository, let RDB do all
queries". 

IMO, "order by NOT_JUST_jcr:score" is very common use case. The way that
retrieving all nodes from multiple BLOBs into Java objects and then do Java
sorting, won't have any performance advantage over that allowing RDB to
handle everything in one shot. 
Also, many RDB products now have full-text search capability, although they
may not be as great as Lucene. When considering the 'over-all' performance,
it might be legitimate to think about a "RDB oriented search/query
mechanism". 

Of course, that may fall beyond the scope of Jacarabbit, as a reference impl
of JCR.

Thanks again & 
Best regards,
Dan
     







Re: RowIterator loop is slow?

Posted by Marcel Reutegger <ma...@gmx.net>.
dan wrote:
> All at once? I thought the Lucene search would return a set of Node UUIDs or
> something similar. Then the reading of actual Rows/Nodes from the result is
> incremental (by smaller chunks). 

well, depending on the query you have and the configuration it may happen that 
all result nodes are read at once. if your query does not have an 'order by' 
clause AND the query handler configuration uses the default value for the 
'respectDocumentOrder' parameter. In that case there is a post processing in the 
query result which orders the result nodes in document order.

If you have an 'order by jcr:score()' OR if you set the 'respectDocumentOrder' 
to false the query result will read the nodes on demand from persistent storage 
when you request them through either Row- or NodeIterator.

>> does the performance improve when you look again through the entries?
> 
> Yes, I saw a slight improvement. But still around 3 seconds. 

Try changing the configuration to respectDocumentOrder=false.

regards
  marcel

RE: RowIterator loop is slow?

Posted by dan <da...@hotmail.com>.
Hi,

> That's probably because jackrabbit needs to read all 2000 entries from
> disk.

All at once? I thought the Lucene search would return a set of Node UUIDs or
something similar. Then the reading of actual Rows/Nodes from the result is
incremental (by smaller chunks). 

> does the performance improve when you look again through the entries?

Yes, I saw a slight improvement. But still around 3 seconds. 

I'm using DB2FileSystem. The Jackrabbit FAQ says DBFileSystem is "slower
than native file system", while LocalFileSystem is "slow on Windows boxes". 
Which of these two file systems is faster on Windows box? Has anyone tested
with both file system on Windows?

Also, If I want to try using LocalFileSystem, can I simply switch the
settings in repository.xml? 

Thanks,
Dan


Re: RowIterator loop is slow?

Posted by Marcel Reutegger <ma...@gmx.net>.
That's probably because jackrabbit needs to read all 2000 entries from disk. 
does the performance improve when you look again through the entries?

regards
  marcel

dan wrote:
> Hi,
> I was using RowIterator to loop through about 2000 entries in query result
> and it took about 3+ seconds. 
> I stripped the code to the bare loop structure like below: 
> 	logger.debug("start loop");
> 	while (rows.hasNext()){
> 		Row row = rows.nextRow();
> 		Value[] values = row.getValues();
> 	}  
> 	logger.debug("end loop");
> 
> The time for going through the entire RowSet is still 3+ second. Tried with
> NodeIterator, the result did not change much. 
> 
> Could anyone advise if this is the normal performance? I'm running this code
> on a Windows 2003 server.
> 
> Thanks,
> Dan
> 

RE: RowIterator loop is slow?

Posted by dan <da...@hotmail.com>.
I've seen some discussion on slow performace on Window box. But those are
about using LocalFileSystem on Windows. In my case, the repository uses
SimpleDBPersistenceManager and DBFileSystem (DB2). 
I thought about maybe Lucene index on Windows file system is also slow to
search with, but since I've already got the RowIterator/NodeIterator, I
guess looping through the iterators has nothing to do with Lucene index.
Is that correct? 
Thanks
Dan


> -----Original Message-----
> From: dan [mailto:danz8086@hotmail.com]
> Sent: November 29, 2006 3:01 PM
> To: users@jackrabbit.apache.org
> Subject: RowIterator loop is slow?
> 
> Hi,
> I was using RowIterator to loop through about 2000 entries in query result
> and it took about 3+ seconds.
> I stripped the code to the bare loop structure like below:
> 	logger.debug("start loop");
> 	while (rows.hasNext()){
> 		Row row = rows.nextRow();
> 		Value[] values = row.getValues();
> 	}
> 	logger.debug("end loop");
> 
> The time for going through the entire RowSet is still 3+ second. Tried
> with
> NodeIterator, the result did not change much.
> 
> Could anyone advise if this is the normal performance? I'm running this
> code
> on a Windows 2003 server.
> 
> Thanks,
> Dan



RowIterator loop is slow?

Posted by dan <da...@hotmail.com>.
Hi,
I was using RowIterator to loop through about 2000 entries in query result
and it took about 3+ seconds. 
I stripped the code to the bare loop structure like below: 
	logger.debug("start loop");
	while (rows.hasNext()){
		Row row = rows.nextRow();
		Value[] values = row.getValues();
	}  
	logger.debug("end loop");

The time for going through the entire RowSet is still 3+ second. Tried with
NodeIterator, the result did not change much. 

Could anyone advise if this is the normal performance? I'm running this code
on a Windows 2003 server.

Thanks,
Dan