You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Matt Johnston (JIRA)" <ji...@apache.org> on 2009/10/09 22:23:31 UTC

[jira] Created: (JCR-2353) Poor performance in range queries using dates

Poor performance in range queries using dates
---------------------------------------------

                 Key: JCR-2353
                 URL: https://issues.apache.org/jira/browse/JCR-2353
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core
    Affects Versions: 1.6.0
            Reporter: Matt Johnston


I am evaluating migrating from 1.5 to 1.6. I created several test cases that prove the query performance of 1.6 is the same or better than 1.5. That is until I add a date property into my query. The repository has 400,000 nodes. Each node as several string based properties (@property, @property2, ...) and a date based property (@datestart). Every node has a relatively unique datestart and the total date range spans 6 years.

In my tests, my base query is:
//element(*,my:namespace)[@property='value'] order by @datestart descending

The time to run this query in 1.5 and 1.6 is:
1.5 = 1.5 seconds
1.6 = 1.5 seconds

If I add a date property:
//element(*,my:namespace)[@property='value' and @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by @datestart descending

the results are:
1.5 = 1.5 seconds
1.6 = 3.5 seconds 

I have isolated the slow down to the implementation of SortedLuceneQueryHits. SortedLuceneQueryHits is not present in 1.5. I have run versions of the test where the query is run 20 times simultaneously and a different time where the query is run 20 times sequentially. In both tests I do see evidence that caching is taking place, but it provides only very minor performance gains. Also, running the 1.6 query multiple times does not decrease the query time dramatically.

http://www.nabble.com/Date-Property-Performance-in-1.6-td25704607.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2353) Poor performance in range queries using dates

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770564#action_12770564 ] 

Marcel Reutegger commented on JCR-2353:
---------------------------------------

Oops, that commit went to the 1.6 branch, but should have gone to trunk.

Reverted previous commit in revision: 830246

and committed the changed to trunk in revision: 830238

> Poor performance in range queries using dates
> ---------------------------------------------
>
>                 Key: JCR-2353
>                 URL: https://issues.apache.org/jira/browse/JCR-2353
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 1.6.0
>            Reporter: Matt Johnston
>             Fix For: 2.0.0
>
>
> I am evaluating migrating from 1.5 to 1.6. I created several test cases that prove the query performance of 1.6 is the same or better than 1.5. That is until I add a date property into my query. The repository has 400,000 nodes. Each node as several string based properties (@property, @property2, ...) and a date based property (@datestart). Every node has a relatively unique datestart and the total date range spans 6 years.
> In my tests, my base query is:
> //element(*,my:namespace)[@property='value'] order by @datestart descending
> The time to run this query in 1.5 and 1.6 is:
> 1.5 = 1.5 seconds
> 1.6 = 1.5 seconds
> If I add a date property:
> //element(*,my:namespace)[@property='value' and @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by @datestart descending
> the results are:
> 1.5 = 1.5 seconds
> 1.6 = 3.5 seconds 
> I have isolated the slow down to the implementation of SortedLuceneQueryHits. SortedLuceneQueryHits is not present in 1.5. I have run versions of the test where the query is run 20 times simultaneously and a different time where the query is run 20 times sequentially. In both tests I do see evidence that caching is taking place, but it provides only very minor performance gains. Also, running the 1.6 query multiple times does not decrease the query time dramatically.
> http://www.nabble.com/Date-Property-Performance-in-1.6-td25704607.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2353) Poor performance in range queries using dates

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770046#action_12770046 ] 

Marcel Reutegger commented on JCR-2353:
---------------------------------------

It turns out the slowdown is not related to SortedLuceneQueryHits. With Jackrabbit 1.6 we also switched to Lucene 2.4.1 and one special handling in our RangeQuery implementation become obsolete [0]. It seems the Lucene replacement is a bit slower than what we previously had.

I'll revert the changed to this class.

Please also note that you can reduce the query time further when you set a limit on the number of results. However, the method Query.setLimit() is only available in JCR 2.0. For 1.6 you'd have to cast the query to QueryImpl.

[0] http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/RangeQuery.java?r1=752064&r2=756444

> Poor performance in range queries using dates
> ---------------------------------------------
>
>                 Key: JCR-2353
>                 URL: https://issues.apache.org/jira/browse/JCR-2353
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 1.6.0
>            Reporter: Matt Johnston
>
> I am evaluating migrating from 1.5 to 1.6. I created several test cases that prove the query performance of 1.6 is the same or better than 1.5. That is until I add a date property into my query. The repository has 400,000 nodes. Each node as several string based properties (@property, @property2, ...) and a date based property (@datestart). Every node has a relatively unique datestart and the total date range spans 6 years.
> In my tests, my base query is:
> //element(*,my:namespace)[@property='value'] order by @datestart descending
> The time to run this query in 1.5 and 1.6 is:
> 1.5 = 1.5 seconds
> 1.6 = 1.5 seconds
> If I add a date property:
> //element(*,my:namespace)[@property='value' and @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by @datestart descending
> the results are:
> 1.5 = 1.5 seconds
> 1.6 = 3.5 seconds 
> I have isolated the slow down to the implementation of SortedLuceneQueryHits. SortedLuceneQueryHits is not present in 1.5. I have run versions of the test where the query is run 20 times simultaneously and a different time where the query is run 20 times sequentially. In both tests I do see evidence that caching is taking place, but it provides only very minor performance gains. Also, running the 1.6 query multiple times does not decrease the query time dramatically.
> http://www.nabble.com/Date-Property-Performance-in-1.6-td25704607.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (JCR-2353) Poor performance in range queries using dates

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger resolved JCR-2353.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.0

Fixed in revision: 829823

> Poor performance in range queries using dates
> ---------------------------------------------
>
>                 Key: JCR-2353
>                 URL: https://issues.apache.org/jira/browse/JCR-2353
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 1.6.0
>            Reporter: Matt Johnston
>             Fix For: 2.0.0
>
>
> I am evaluating migrating from 1.5 to 1.6. I created several test cases that prove the query performance of 1.6 is the same or better than 1.5. That is until I add a date property into my query. The repository has 400,000 nodes. Each node as several string based properties (@property, @property2, ...) and a date based property (@datestart). Every node has a relatively unique datestart and the total date range spans 6 years.
> In my tests, my base query is:
> //element(*,my:namespace)[@property='value'] order by @datestart descending
> The time to run this query in 1.5 and 1.6 is:
> 1.5 = 1.5 seconds
> 1.6 = 1.5 seconds
> If I add a date property:
> //element(*,my:namespace)[@property='value' and @datestart<=xs:dateTime('2009-09-24T11:53:23.293-05:00')] order by @datestart descending
> the results are:
> 1.5 = 1.5 seconds
> 1.6 = 3.5 seconds 
> I have isolated the slow down to the implementation of SortedLuceneQueryHits. SortedLuceneQueryHits is not present in 1.5. I have run versions of the test where the query is run 20 times simultaneously and a different time where the query is run 20 times sequentially. In both tests I do see evidence that caching is taking place, but it provides only very minor performance gains. Also, running the 1.6 query multiple times does not decrease the query time dramatically.
> http://www.nabble.com/Date-Property-Performance-in-1.6-td25704607.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.