You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by majohnst <ma...@lattaoutdoors.com> on 2009/10/01 20:52:23 UTC

Date Property Performance in 1.6

I am evaluating migrating from 1.5 to 1.6. I created several test cases that
prove the query performance of 1.6 is the same or better than 1.5. That is
until I add a date property into my query. 

In my tests, my base query is:
//element(*,my:namespace)[@property='value'] order by @datestart descending

The time to run this query in 1.5 and 1.6 is:
1.5 = 1.5 seconds
1.6 = 1.5 seconds

If I add a date property:
//element(*,my:namespace)[@property='value' and
@datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
@datestart descending

the results are:
1.5 = 1.5 seconds
1.6 = 3.5 seconds

What changed in version 1.6 that would cause the date performance to degrade
so much?
-- 
View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25704607.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Date Property Performance in 1.6

Posted by Ard Schrijvers <a....@onehippo.com>.
On Fri, Oct 2, 2009 at 11:52 AM, Alexander Klimetschek <ak...@day.com> wrote:
> On Thu, Oct 1, 2009 at 20:52, majohnst <ma...@lattaoutdoors.com> wrote:
>> If I add a date property:
>> //element(*,my:namespace)[@property='value' and
>> @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
>> @datestart descending
>>
>> the results are:
>> 1.5 = 1.5 seconds
>> 1.6 = 3.5 seconds
>>
>> What changed in version 1.6 that would cause the date performance to degrade
>> so much?
>
> Are the query times still consistently slower when you ran the query
> multiple times and take the average?

1.5 seconds for a very simple query like:

 //element(*,my:namespace)[@property='value'] order by @datestart descending

is strange already. As Alexander points out, you need to test like 10
times the query, *after* you ran the query already 10 times. This is
in order to rule out FS caches warming up, lucene caches, etc etc.
Also, pls, use a setLimit(1) on the query (perhaps you first need to
cast to QueryImpl) to rule out the fetching of the nodes: this way,
you get a better idea *how* long the lucene part takes. Again, 1.5
seconds is very long, but might be reasonable for the very first time
you order on datestart as then all fieldcache's in lucene are still
empty.

Regards Ard

>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
>

Re: Date Property Performance in 1.6

Posted by Alexander Klimetschek <ak...@day.com>.
On Thu, Oct 1, 2009 at 20:52, majohnst <ma...@lattaoutdoors.com> wrote:
> If I add a date property:
> //element(*,my:namespace)[@property='value' and
> @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
> @datestart descending
>
> the results are:
> 1.5 = 1.5 seconds
> 1.6 = 3.5 seconds
>
> What changed in version 1.6 that would cause the date performance to degrade
> so much?

Are the query times still consistently slower when you ran the query
multiple times and take the average?

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Date Property Performance in 1.6

Posted by majohnst <ma...@lattaoutdoors.com>.
I've created an issue in JIRA for this:
https://issues.apache.org/jira/browse/JCR-2353




Marcel Reutegger wrote:
> 
> Hi,
> 
> On Tue, Oct 6, 2009 at 14:50, majohnst <ma...@lattaoutdoors.com> wrote:
>>
>> Jackrabbit 1.6 uses Lucene 2.4.1. I think 1.5 uses Lucene 2.4.0. In
>> Jackrabbit 1.6, some things were refactored to work better with 2.4.1.
> 
> jackrabbit 1.5 depends on lucene 2.3.2.
> 
>> The SortedLuceneQueryHits is new in 1.6, so I cannot tell if the
>> performance
>> drop is caused by this new Jackrabbit code or by some code changes in
>> Lucene
>> 2.4.1.
> 
> can you please file a jira issue with your observations and describe
> in more detail how your content structure looks like (number of nodes,
> distinct date values, other property values that you used in your
> query)?
> 
> thanks
> 
> regards
>  marcel
> 
> 

-- 
View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25856478.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Date Property Performance in 1.6

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

On Tue, Oct 6, 2009 at 14:50, majohnst <ma...@lattaoutdoors.com> wrote:
>
> Jackrabbit 1.6 uses Lucene 2.4.1. I think 1.5 uses Lucene 2.4.0. In
> Jackrabbit 1.6, some things were refactored to work better with 2.4.1.

jackrabbit 1.5 depends on lucene 2.3.2.

> The SortedLuceneQueryHits is new in 1.6, so I cannot tell if the performance
> drop is caused by this new Jackrabbit code or by some code changes in Lucene
> 2.4.1.

can you please file a jira issue with your observations and describe
in more detail how your content structure looks like (number of nodes,
distinct date values, other property values that you used in your
query)?

thanks

regards
 marcel

Re: Date Property Performance in 1.6

Posted by majohnst <ma...@lattaoutdoors.com>.
Jackrabbit 1.6 uses Lucene 2.4.1. I think 1.5 uses Lucene 2.4.0. In
Jackrabbit 1.6, some things were refactored to work better with 2.4.1.
Overall, most queries do perform better. But I guess this type of query did
not see the intended performance gain.

The SortedLuceneQueryHits is new in 1.6, so I cannot tell if the performance
drop is caused by this new Jackrabbit code or by some code changes in Lucene
2.4.1.





Ard Schrijvers-3 wrote:
> 
> On Mon, Oct 5, 2009 at 10:38 PM, majohnst <ma...@lattaoutdoors.com> wrote:
>>
>> I've done a lot more digging and the slowdown seems to happen in the
>> SortedLuceneQueryHits class. Specifically in the first two lines of the
>> getHits() method. Line one creates a TopFieldDocCollector and line 2
>> searches the index with searcher.search(query, collector).
> 
> has this code changed wrt the jackrabbit version for which is was
> faster? If not, then what are the lucene versions in both cases?
> 
>>
>> Since the query is trying to find a date that is >=, does this get turned
>> into a range query in lucene? Would that slow things down?
> 
> Well, range queries are indeed slow. But, they shouldn't become slower
> overnight. There are more effective ways in lucene these days, but
> that is something for coming versions.
> 
> Regards Ard
> 
> 

-- 
View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25768083.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Date Property Performance in 1.6

Posted by Ard Schrijvers <a....@onehippo.com>.
On Mon, Oct 5, 2009 at 10:38 PM, majohnst <ma...@lattaoutdoors.com> wrote:
>
> I've done a lot more digging and the slowdown seems to happen in the
> SortedLuceneQueryHits class. Specifically in the first two lines of the
> getHits() method. Line one creates a TopFieldDocCollector and line 2
> searches the index with searcher.search(query, collector).

has this code changed wrt the jackrabbit version for which is was
faster? If not, then what are the lucene versions in both cases?

>
> Since the query is trying to find a date that is >=, does this get turned
> into a range query in lucene? Would that slow things down?

Well, range queries are indeed slow. But, they shouldn't become slower
overnight. There are more effective ways in lucene these days, but
that is something for coming versions.

Regards Ard

>
>
>
> majohnst wrote:
>>
>> I am using the ObjectContentManager to run my queries, so I am not able to
>> set the setLimit(1). I also thought that since I have an order by clause,
>> the order by clause would override anything the respect document order
>> property does.
>>
>> The code for my query is:
>>
>> ObjectContentManager manager ... (injected from parent)
>> String finalQuery = "//element(*,my:namespace)[@property='value' and
>> @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
>> @datestart descending"
>> ObjectIterator iter = (ObjectIterator)
>> manager.getObjectIterator(finalQuery, Query.XPATH);
>>
>> I then iterate over the ObjectIterator to grab the first result. Normally
>> I would use iter.skip() to jump to a particular page in the search results
>> and begin iterating from there.
>>
>>
>> The ObjectContentManager.getObjectIterator() ends up calling
>> getNodeIterator(). In getNodeIterator(), the code is:
>>
>> Query jcrQuery =
>> session.getWorkspace().getQueryManager().createQuery(query, language);
>> QueryResult queryResult = jcrQuery.execute();
>> NodeIterator nodeIterator = queryResult.getNodes();
>> return nodeIterator;
>>
>>
>> By stepping through the process, I can see that the major bottleneck
>> happens in the NodeIterator for the line:
>> QueryResult queryResult = jcrQuery.execute();
>>
>>
>> So I really think that is has something to do with the query and not the
>> other parts of the process. Plus, when the date field is not present in
>> the query, the iterations and other methods all work fine without a
>> problem.
>>
>>
>>
>> Ard Schrijvers-3 wrote:
>>>
>>> <snip/>
>>>
>>>> In both the 1.5 and 1.6 sequential test, I only saw a minor performance
>>>> improvement in query time from query #1 to #2. But queries #2 to #20 all
>>>> had
>>>> the same query times. I am assuming that all the caching has warmed up
>>>> after
>>>> the first query.
>>>
>>> How many nodes are you querying? Did you try the setLimit(1), because
>>> then we know whether it is really the query that is slower.
>>> Furthermore, you do not have respectDocumentOrder set to true, as I
>>> think the response is so slow. OTOH, it might be that you are
>>> retrieving all the search results, so, you are in that case not
>>> testing the search but also the authorization of the nodes (or do you
>>> also iterate over the search results?).
>>>
>>> Regards Ard
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25757007.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>
>

Re: Date Property Performance in 1.6

Posted by majohnst <ma...@lattaoutdoors.com>.
I've done a lot more digging and the slowdown seems to happen in the
SortedLuceneQueryHits class. Specifically in the first two lines of the
getHits() method. Line one creates a TopFieldDocCollector and line 2
searches the index with searcher.search(query, collector).

Since the query is trying to find a date that is >=, does this get turned
into a range query in lucene? Would that slow things down?



majohnst wrote:
> 
> I am using the ObjectContentManager to run my queries, so I am not able to
> set the setLimit(1). I also thought that since I have an order by clause,
> the order by clause would override anything the respect document order
> property does.
> 
> The code for my query is:
> 
> ObjectContentManager manager ... (injected from parent)
> String finalQuery = "//element(*,my:namespace)[@property='value' and
> @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
> @datestart descending"
> ObjectIterator iter = (ObjectIterator)
> manager.getObjectIterator(finalQuery, Query.XPATH);
> 
> I then iterate over the ObjectIterator to grab the first result. Normally
> I would use iter.skip() to jump to a particular page in the search results
> and begin iterating from there.
> 
> 
> The ObjectContentManager.getObjectIterator() ends up calling
> getNodeIterator(). In getNodeIterator(), the code is:
> 
> Query jcrQuery =
> session.getWorkspace().getQueryManager().createQuery(query, language);
> QueryResult queryResult = jcrQuery.execute();
> NodeIterator nodeIterator = queryResult.getNodes();
> return nodeIterator;
> 
> 
> By stepping through the process, I can see that the major bottleneck
> happens in the NodeIterator for the line:
> QueryResult queryResult = jcrQuery.execute();
> 
> 
> So I really think that is has something to do with the query and not the
> other parts of the process. Plus, when the date field is not present in
> the query, the iterations and other methods all work fine without a
> problem. 
> 
> 
> 
> Ard Schrijvers-3 wrote:
>> 
>> <snip/>
>> 
>>> In both the 1.5 and 1.6 sequential test, I only saw a minor performance
>>> improvement in query time from query #1 to #2. But queries #2 to #20 all
>>> had
>>> the same query times. I am assuming that all the caching has warmed up
>>> after
>>> the first query.
>> 
>> How many nodes are you querying? Did you try the setLimit(1), because
>> then we know whether it is really the query that is slower.
>> Furthermore, you do not have respectDocumentOrder set to true, as I
>> think the response is so slow. OTOH, it might be that you are
>> retrieving all the search results, so, you are in that case not
>> testing the search but also the authorization of the nodes (or do you
>> also iterate over the search results?).
>> 
>> Regards Ard
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25757007.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Date Property Performance in 1.6

Posted by majohnst <ma...@lattaoutdoors.com>.
I am using the ObjectContentManager to run my queries, so I am not able to
set the setLimit(1). I also thought that since I have an order by clause,
the order by clause would override anything the respect document order
property does.

The code for my query is:

ObjectContentManager manager ... (injected from parent)
String finalQuery = "//element(*,my:namespace)[@property='value' and
@datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
@datestart descending"
ObjectIterator iter = (ObjectIterator) manager.getObjectIterator(finalQuery,
Query.XPATH);

I then iterate over the ObjectIterator to grab the first result. Normally I
would use iter.skip() to jump to a particular page in the search results and
begin iterating from there.


The ObjectContentManager.getObjectIterator() ends up calling
getNodeIterator(). In getNodeIterator(), the code is:

Query jcrQuery = session.getWorkspace().getQueryManager().createQuery(query,
language);
QueryResult queryResult = jcrQuery.execute();
NodeIterator nodeIterator = queryResult.getNodes();
return nodeIterator;


By stepping through the process, I can see that the major bottleneck happens
in the NodeIterator for the line:
QueryResult queryResult = jcrQuery.execute();


So I really think that is has something to do with the query and not the
other parts of the process. Plus, when the date field is not present in the
query, the iterations and other methods all work fine without a problem. 



Ard Schrijvers-3 wrote:
> 
> <snip/>
> 
>> In both the 1.5 and 1.6 sequential test, I only saw a minor performance
>> improvement in query time from query #1 to #2. But queries #2 to #20 all
>> had
>> the same query times. I am assuming that all the caching has warmed up
>> after
>> the first query.
> 
> How many nodes are you querying? Did you try the setLimit(1), because
> then we know whether it is really the query that is slower.
> Furthermore, you do not have respectDocumentOrder set to true, as I
> think the response is so slow. OTOH, it might be that you are
> retrieving all the search results, so, you are in that case not
> testing the search but also the authorization of the nodes (or do you
> also iterate over the search results?).
> 
> Regards Ard
> 
> 

-- 
View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25720351.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Date Property Performance in 1.6

Posted by Ard Schrijvers <a....@onehippo.com>.
<snip/>

> In both the 1.5 and 1.6 sequential test, I only saw a minor performance
> improvement in query time from query #1 to #2. But queries #2 to #20 all had
> the same query times. I am assuming that all the caching has warmed up after
> the first query.

How many nodes are you querying? Did you try the setLimit(1), because
then we know whether it is really the query that is slower.
Furthermore, you do not have respectDocumentOrder set to true, as I
think the response is so slow. OTOH, it might be that you are
retrieving all the search results, so, you are in that case not
testing the search but also the authorization of the nodes (or do you
also iterate over the search results?).

Regards Ard

Re: Date Property Performance in 1.6

Posted by majohnst <ma...@lattaoutdoors.com>.
The way I tested this was to startup my repository, then create a group of
threads. Each thread would open a session and run the query. I did it this
way to simulate our webapp since every request coming into the webapp
creates a new jackrabbit session.

I re-ran the queries with 20 threads, so the query is being called 20 times
nearly simultaneously. The results still look the same.

1.5 = 13.3 seconds to run all 20 queries
1.6 = 36 seconds to run all 20 queries

I've run the test with up to 100 threads and the results only get worse as
more threads are added.

I also ran a seperate test (without threads) where the queries are run
sequentially instead of simultaneously.

1.5 = 19.3 seconds to run all 20 queries sequentially
1.6 = 51.1 seconds to run all 20 queries sequentially

In both the 1.5 and 1.6 sequential test, I only saw a minor performance
improvement in query time from query #1 to #2. But queries #2 to #20 all had
the same query times. I am assuming that all the caching has warmed up after
the first query.





majohnst wrote:
> 
> I am evaluating migrating from 1.5 to 1.6. I created several test cases
> that prove the query performance of 1.6 is the same or better than 1.5.
> That is until I add a date property into my query. 
> 
> In my tests, my base query is:
> //element(*,my:namespace)[@property='value'] order by @datestart
> descending
> 
> The time to run this query in 1.5 and 1.6 is:
> 1.5 = 1.5 seconds
> 1.6 = 1.5 seconds
> 
> If I add a date property:
> //element(*,my:namespace)[@property='value' and
> @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
> @datestart descending
> 
> the results are:
> 1.5 = 1.5 seconds
> 1.6 = 3.5 seconds
> 
> What changed in version 1.6 that would cause the date performance to
> degrade so much?
> 

-- 
View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25716499.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Date Property Performance in 1.6

Posted by majohnst <ma...@lattaoutdoors.com>.
One quick update, I re-ran my 20 sequential query test using the query that
does not have a date in it.
//element(*,my:namespace)[@property='value'] order by @datestart descending

The results for that test are:
1.5 = 16 seconds to run all 20 queries
1.6 = 7.9 seconds to run all 20 queries

So there is definitely something about the date field that is causing a
performance issue. I am using OCM if that makes a difference.





majohnst wrote:
> 
> I am evaluating migrating from 1.5 to 1.6. I created several test cases
> that prove the query performance of 1.6 is the same or better than 1.5.
> That is until I add a date property into my query. 
> 
> In my tests, my base query is:
> //element(*,my:namespace)[@property='value'] order by @datestart
> descending
> 
> The time to run this query in 1.5 and 1.6 is:
> 1.5 = 1.5 seconds
> 1.6 = 1.5 seconds
> 
> If I add a date property:
> //element(*,my:namespace)[@property='value' and
> @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by
> @datestart descending
> 
> the results are:
> 1.5 = 1.5 seconds
> 1.6 = 3.5 seconds
> 
> What changed in version 1.6 that would cause the date performance to
> degrade so much?
> 

-- 
View this message in context: http://www.nabble.com/Date-Property-Performance-in-1.6-tp25704607p25716507.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.