You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/08/16 16:15:15 UTC

[jira] Created: (JCR-2715) Improved join query performance

Improved join query performance
-------------------------------

                 Key: JCR-2715
                 URL: https://issues.apache.org/jira/browse/JCR-2715
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core, query
            Reporter: Jukka Zitting


Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (JCR-2715) Improved join query performance

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-2715:
-------------------------------

         Assignee: Jukka Zitting
    Fix Version/s: 2.2.0

I'm currently working on this, targeting the 2.2 release. The optimization plan I'm following is:

1. Map single-selector queries directly to underlying Lucene queries, like we've so far done for XPath and SQL1
2. Split join queries into a set of per-selector queries, and combine these partial results into the join result set
3. When splitting join queries, use the results of the already executed left component query to turn the join condition into an extra constraint for the right component query

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922469#action_12922469 ] 

Serge Huber commented on JCR-2715:
----------------------------------

Hello Jukka, 

I see you have been committing quite a lot of changes yesterday. Thank you very much ! I have been looking at the changes. Do you still have more to commit ? I have some trouble understanding how it works currently, as there seems to be no references to the QueryEngine in the code. I'm assuming you haven't "glued" it together yet, is that correct or that you will commit this soon ? Did I miss something ? 

Best regards, 
  Serge Huber.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922058#action_12922058 ] 

Serge Huber commented on JCR-2715:
----------------------------------

Thank you for your reply, I will pull the changes from SVN, test it and give you feedback. 

I am using another unit test that does a lot of concurrent read, writes and searches. Maybe this is something I could contribute but it is not yet generic to Jackrabbit and currently has dependencies t our product. Basically we are testing with larger loads than the currently available tests do.

Regards,
  Serge Huber.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922030#action_12922030 ] 

Jukka Zitting commented on JCR-2715:
------------------------------------

Sorry about the delay on this, and thanks for the offer to help in testing! I'll start pushing my changes to svn now.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920495#action_12920495 ] 

Serge Huber commented on JCR-2715:
----------------------------------

We are seeing this issue also, so we would be very interested in the work you are doing. Is there anything already available to test ?

Regards,
  Serge Huber.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921338#action_12921338 ] 

Serge Huber commented on JCR-2715:
----------------------------------

If I understand this ticket properly, this doesn't only happen for join queries but for all SQL-2 queries no ?

In the first solution, do you mean you intend to map the single selector queries along with using BooleanQuery objects to map constraints directly to the underlying Lucene query ?

Anyway, I'd be willing to help anyway possible as this is become the biggest performance issue we are seeing in testing Jackrabbit with non-trivial data sets and loads.

Regards,
  Serge Huber.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922595#action_12922595 ] 

Jukka Zitting commented on JCR-2715:
------------------------------------

As of revision 1024283 the new join implementation has been hooked up for handling inner equi-joins. Support for other join types still requires more work, and I still need to add the tighter Lucene integration.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922972#action_12922972 ] 

Jukka Zitting commented on JCR-2715:
------------------------------------

Good point about the limits! The current code applies limits before sorting, which is obviously wrong. I'll fix that.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922476#action_12922476 ] 

Jukka Zitting commented on JCR-2715:
------------------------------------

Yes, I'm still working on fixing some remaining issues before I plug the new implementation in as the QueryObjectModelImpl.execute() method. I also have some pending work on tighter integrating the new join query code with the underlying Lucene index so we can avoid the extra layer of SQL1 queries that the implementation now uses.

For now it's possible to use this implementation by directly instantiating the QueryEngine class like this:

    QueryObjectModel qom = ...;
    QueryResult result  = new QueryEngine(session).execute(
            qom.getColumns(), qom.getSource(),
            qom.getConstraint(), qom.getOrderings());


> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923846#action_12923846 ] 

Serge Huber commented on JCR-2715:
----------------------------------

Hi Jukka, 

I just tested the latest commits, and it's looking quite good. I only saw 2 tests that don't seem to work yet, but I'm assuming you're already aware of this ? 

EquiJoinConditionTest

	testInnerJoin1(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest)	0.692
	testInnerJoin2(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest)	0.608
	testRightOuterJoin1(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) + [ Detail ]	0.138
|/testroot/node1| is not part of the result set	
	testRightOuterJoin2(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest)	0.09
	testLeftOuterJoin1(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest)	0.103
	testLeftOuterJoin2(org.apache.jackrabbit.test.api.query.qom.EquiJoinConditionTest) + [ Detail ]	0.071
/testroot/node1|| is not part of the result set	

Best regards,
  Serge Huber.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (JCR-2715) Improved join query performance

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JCR-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922872#action_12922872 ] 

Serge Huber commented on JCR-2715:
----------------------------------

Thanks for all the work ! I've tested the QueryEngine and results do indeed seem to be faster, although I had an issue I'm not sure is due to my code regarding reading results twice.

Also, concerning limits, do they currently work with sorting ? If I limit to 100 results, will it be the sorted 100 first results ? Or is sorting done after limiting ?

Let me know how I can help,
Best regards,
  Serge Huber.

> Improved join query performance
> -------------------------------
>
>                 Key: JCR-2715
>                 URL: https://issues.apache.org/jira/browse/JCR-2715
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 2.2.0
>
>
> Our current implementation of SQL2 join queries does not perform very well on pretty much any non-trivial data set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.