You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by "Serge Huber (JIRA)" <ji...@apache.org> on 2010/12/08 08:39:01 UTC

[jira] Created: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Poor performance of ISDESCENDANTNODE on SQL 2 queries
-----------------------------------------------------

                 Key: JCR-2835
                 URL: https://issues.apache.org/jira/browse/JCR-2835
             Project: Jackrabbit Content Repository
          Issue Type: Bug
    Affects Versions: 2.2.0
            Reporter: Serge Huber



Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 

select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 

executes in 600ms 

select * from [jnt:news] as news order by news.[date] desc

executes in 4ms

>From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 

    private Query getDescendantNodeQuery(
            DescendantNode dn, JackrabbitIndexSearcher searcher)
            throws RepositoryException, IOException {
        BooleanQuery query = new BooleanQuery();

        try {
            LinkedList<NodeId> ids = new LinkedList<NodeId>();
            NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
            ids.add(ancestor.getNodeId());
            while (!ids.isEmpty()) {
                String id = ids.removeFirst().toString();
                Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
                QueryHits hits = searcher.evaluate(q);
                ScoreNode sn = hits.nextScoreNode();
                if (sn != null) {
                    query.add(q, SHOULD);
                    do {
                        ids.add(sn.getNodeId());
                        sn = hits.nextScoreNode();
                    } while (sn != null);
                }
            }
        } catch (PathNotFoundException e) {
            query.add(new JackrabbitTermQuery(new Term(
                    FieldNames.UUID, "invalid-node-id")), // never matches
                    SHOULD);
        }

        return query;
    }

In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?

This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972384#action_12972384 ] 

Serge Huber commented on JCR-2835:
----------------------------------

Also, maybe we should port this to 2.2.1 ? 

Regards,
  Serge Huber.



> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber updated JCR-2835:
-----------------------------

    Fix Version/s: 2.2.1

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.2.1, 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber updated JCR-2835:
-----------------------------

    Attachment: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch


I am attaching a patch to the LuceneQueryFactory that replaces the recursive Lucene queries with JCR sub-tree traversing. This seems to yield a little bit better performance (x3) in my tests, but this is still slow if the sub-tree has a lot of nodes.

I welcome any feedback you may have. I am also ready to commit this if you'd like.

Best regards,
  Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Serge Huber
>             Fix For: 2.2.0, 2.3.0
>
>         Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber updated JCR-2835:
-----------------------------

    Affects Version/s: 2.3.0
                       2.2.1
               Status: Patch Available  (was: Open)

I am attaching a first pass at the descendant search tests. 
These tests were performed on the trunk WITHOUT the proposed patch. I will work on implementing Jukka's proposal now that I have the tests.

Please review the XPath one as I am not that fluent in those queries. 

The current difference is huge (provided my tests are correct) : 

XPath : 
# DescendantSearchTest                   min     10%     50%     90%     max
2.2                                       25      34      43      59     265

SQL-2 : 

# SQL2DescendantSearchTest               min     10%     50%     90%     max
2.2                                   395318  395318  395318  395318  395318

If the test implementations look ok, I can commit them once reviewed. 

Best regards,
   Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>            Assignee: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970436#action_12970436 ] 

Jukka Zitting commented on JCR-2835:
------------------------------------

BTW, when adding new files, remember to always include the Apache license header. I added the header to the new test classes in revision 1044613 to fix the Hudson failure caused by this.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber reassigned JCR-2835:
--------------------------------

    Assignee: Serge Huber

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0
>            Reporter: Serge Huber
>            Assignee: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970715#action_12970715 ] 

Serge Huber commented on JCR-2835:
----------------------------------

Sorry about that Jukka, my bad. Didn't know this could cause Hudson to fail.

Btw unfortunately I didn't have the time to test your proposal. I was working on comparing the Lucene queries between the XPath and SQL-2 tests, and saw that the DescendantChildNodeQuery is being used in the case of XPath but not in the case of SQL-2. I'm not (yet) an expert at Lucene, but maybe that's a place to start ? 

I also notice that the SimpleQueryResult does not support result fetch size as the other SingleColumnQueryResult and MultipleColumnQueryResult do. I realize this is because of the join merging, but maybe we should look at being able to do "progressive" merging alongside with merges in order to reduce the number of results being loaded systematically. Again I haven't thought this through completely and maybe there is some limitation on doing so.

These query problems are difficult because we are basically rewriting a full-fledged SQL optimizer, and maybe we should look at how databases perform these ?

Regards,
  Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970024#action_12970024 ] 

Jukka Zitting commented on JCR-2835:
------------------------------------

Thanks for adding the test! Feel free to commit it.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970100#action_12970100 ] 

Serge Huber edited comment on JCR-2835 at 12/10/10 3:33 AM:
------------------------------------------------------------

Ok I have committed the perf tests in revision 1044239. 

I will be working on trying out your suggestion today.

Best regards,
  Serge Huber.

      was (Author: bhillou):
    Ok I have committed the perf tests in revision 1043897. 

I will be working on trying out your suggestion today.

Best regards,
  Serge Huber.
  
> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972436#action_12972436 ] 

Jukka Zitting commented on JCR-2835:
------------------------------------

+1 Looks great!

Re: 2.2.1; Yes, I think it would be a good idea to backport this even though strictly speaking it's not just a bug fix. We don't introduce any externally visible API or behaviour changes, and this change should also fix the "too many clauses" problem already reported on users@.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber updated JCR-2835:
-----------------------------

    Attachment: JCR-2835_PerformanceTests.patch

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>            Assignee: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber updated JCR-2835:
-----------------------------

    Attachment: SQL2DescendantSearchTest.png
                DescendantSearchTest.png

I have generated the performance graphs and indeed this patch looks really good !

Btw I had a lot of trouble generating the graphs under Mac OS X. It took me a while to understand that I needed to install the following packages from fink : 

imagemagick2-svg
gnuplot

I think we might want to add that to the README.txt if there are others trying to use Mac OS X to generate the graphs.

If nobody has any objections, I'd like to commit this patch since the results are really much better ?

Best regards,
  Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969649#action_12969649 ] 

Serge Huber commented on JCR-2835:
----------------------------------

Hello Jukka, 

Thanks again for your quick answer.

Yes agreed we should provide a test case. Where should this be included ? I might have the opportunity to help develop this but I don't really know the best place to add such a case ?

Interesting approach for the levels, this would indeed reduce the number of queries, although the clauses could get quite large, not sure if that's an issue for Lucene.

Best regards,
  Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Thomas Draier (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971692#action_12971692 ] 

Thomas Draier commented on JCR-2835:
------------------------------------

Hi,

I just made some improvements in the DescendantNode constraint, using the same kind of subquery we do in XPATH (DescendantSelfAxisQuery)

First I had to slightly change the XPath test in order to make it more comparable with the one SQL-2, as the current query in DescendantSearchTest does not return any result :-)

So instead of : /testroot//*[@testcount=" + i + "]"  
I used :  /jcr:root/testroot//element(*,nt:base)[@testcount=" + i + "]"
(added a the jcr:root , and an nt:base constraint to have the same constraint as in sql-2 - btw this could also be improved, as a constraint on nt:base does not make much sense and does not need to be expanded to all sub types)

Before patching, i had theses figures - similar to what serge got before

# DescendantSearchTest                   min     10%     50%     90%     max
2.2                                      411     416     430     450     690
# SQL2DescendantSearchTest               min     10%     50%     90%     max
2.2                                   203530  203530  203530  203530  203530

After  patching :

# DescendantSearchTest                   min     10%     50%     90%     max
2.3                                      420     429     448     479    1208
# SQL2DescendantSearchTest               min     10%     50%     90%     max
2.3                                      319     327     339     351     375

Which make the SQL2 queries even faster than the XPATH one. Basically, I use a DescendantSelfAxisQuery with subqueries when possible. Compared to Xpath, the context query is simpler ( the one that gets the ancestor node ), as it is based on nodeid instead of nested ChildAxisQuery queries - which can explain that sql-2 is slightly faster.

For example, an xpath query like : " /jcr:root/folder1/folder2//element(*,nt:type) "
Is translated to :

+DescendantSelfAxisQuery(
      +ChildAxisQuery(
            +ChildAxisQuery(
                 _:PARENT:, 
                 {}folder1), 
            {}folder2), 
      +_:PROPERTIES:1570322:primaryType[14877513:type, 
      1)

Where an equivalent " select * from [nt:type] as obj where ISDESCENDANTNODE(obj, '/folder1/folder2') " gives :
      
DescendantSelfAxisQuery(_:UUID:a4137e73-6a16-4148-9d61-2353230a15d0, 
      +_:PROPERTIES:1570322:primaryType[14877513:type, 
      1)

Note that it currently only works in the first level of constraint - an isDescendantNode constraint inside an OR / NOT boolean query won't use the subquery. I don't think it's a big issue for the OR - but it can be for the NOT .. 

The patch is attached ..

Regards


> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972379#action_12972379 ] 

Serge Huber edited comment on JCR-2835 at 12/17/10 2:50 AM:
------------------------------------------------------------

I have generated the performance graphs and indeed this patch looks really good !

Btw I had a lot of trouble generating the graphs under Mac OS X. It took me a while to understand that I needed to install the following packages from fink : 

imagemagick2-svg
gnuplot

I have added to the README.txt instructions on how to use the script under Mac OS X to generate the graphs.

If nobody has any objections, I'd like to commit this patch since the results are really much better ?

Best regards,
  Serge Huber.

      was (Author: bhillou):
    I have generated the performance graphs and indeed this patch looks really good !

Btw I had a lot of trouble generating the graphs under Mac OS X. It took me a while to understand that I needed to install the following packages from fink : 

imagemagick2-svg
gnuplot

I think we might want to add that to the README.txt if there are others trying to use Mac OS X to generate the graphs.

If nobody has any objections, I'd like to commit this patch since the results are really much better ?

Best regards,
  Serge Huber.
  
> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-2835:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Merged to the 2.2 branch in revision 1054716.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.2.1, 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969838#action_12969838 ] 

Serge Huber commented on JCR-2835:
----------------------------------

I just tested with the patch I proposed here, the results are slightly better, but still very far from the XPath implementation : 

# SQL2DescendantSearchTest               min     10%     50%     90%     max
2.2                                   224662  224662  224662  224662  224662

Best regards,
  Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber reassigned JCR-2835:
--------------------------------

    Assignee:     (was: Serge Huber)

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated JCR-2835:
-------------------------------

      Component/s: query
                   jackrabbit-core
      Description: 
Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 

select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 

executes in 600ms 

select * from [jnt:news] as news order by news.[date] desc

executes in 4ms

>From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 

    private Query getDescendantNodeQuery(
            DescendantNode dn, JackrabbitIndexSearcher searcher)
            throws RepositoryException, IOException {
        BooleanQuery query = new BooleanQuery();

        try {
            LinkedList<NodeId> ids = new LinkedList<NodeId>();
            NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
            ids.add(ancestor.getNodeId());
            while (!ids.isEmpty()) {
                String id = ids.removeFirst().toString();
                Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
                QueryHits hits = searcher.evaluate(q);
                ScoreNode sn = hits.nextScoreNode();
                if (sn != null) {
                    query.add(q, SHOULD);
                    do {
                        ids.add(sn.getNodeId());
                        sn = hits.nextScoreNode();
                    } while (sn != null);
                }
            }
        } catch (PathNotFoundException e) {
            query.add(new JackrabbitTermQuery(new Term(
                    FieldNames.UUID, "invalid-node-id")), // never matches
                    SHOULD);
        }

        return query;
    }

In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?

This was probably also missed because I didn't seem to find any performance tests on this constraint.

  was:

Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 

select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 

executes in 600ms 

select * from [jnt:news] as news order by news.[date] desc

executes in 4ms

>From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 

    private Query getDescendantNodeQuery(
            DescendantNode dn, JackrabbitIndexSearcher searcher)
            throws RepositoryException, IOException {
        BooleanQuery query = new BooleanQuery();

        try {
            LinkedList<NodeId> ids = new LinkedList<NodeId>();
            NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
            ids.add(ancestor.getNodeId());
            while (!ids.isEmpty()) {
                String id = ids.removeFirst().toString();
                Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
                QueryHits hits = searcher.evaluate(q);
                ScoreNode sn = hits.nextScoreNode();
                if (sn != null) {
                    query.add(q, SHOULD);
                    do {
                        ids.add(sn.getNodeId());
                        sn = hits.nextScoreNode();
                    } while (sn != null);
                }
            }
        } catch (PathNotFoundException e) {
            query.add(new JackrabbitTermQuery(new Term(
                    FieldNames.UUID, "invalid-node-id")), // never matches
                    SHOULD);
        }

        return query;
    }

In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?

This was probably also missed because I didn't seem to find any performance tests on this constraint.

    Fix Version/s:     (was: 2.2.0)
       Issue Type: Improvement  (was: Bug)

This probably won't make it in the 2.2 timeframe, so targetting just 2.3 for now.

When fixing this, we should start by adding a test case to the peformance test suite (possibly with an option to compare SQL2 against XPath). That way we'll have solid numbers to back any improvement ideas.

In general subtree queries are troublesome even for XPath, but there's still a lot we can do to bring SQL2 at least close to that level. One idea I had (but didn't yet have time to implement) was to collect the parent UUIDs of the entire subtree not by traversing each node separately, but by per-level queries like 1) parent-id=<uuid-of-ancestor>, 2) parent-id=<uuid-of-child1> parent-id=<uuid-of-child2> ..., 3) same for grandchildren, etc. That way we'd only need to execute a handful of Lucene queries per one subtree constraint.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970100#action_12970100 ] 

Serge Huber commented on JCR-2835:
----------------------------------

Ok I have committed the perf tests in revision 1043897. 

I will be working on trying out your suggestion today.

Best regards,
  Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Serge Huber updated JCR-2835:
-----------------------------

    Fix Version/s: 2.3.0
                   2.2.0

Added fix version, please correct if needed.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Serge Huber
>             Fix For: 2.2.0, 2.3.0
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972440#action_12972440 ] 

Serge Huber commented on JCR-2835:
----------------------------------

Thanks Jukka, will you take care of the backport or should I do it ?

Regards,
  Serge... 

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.2.1, 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Thomas Draier (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Draier updated JCR-2835:
-------------------------------

    Attachment: JCR-2835-use-DescendantSelfAxisQuery.patch

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Serge Huber (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972444#action_12972444 ] 

Serge Huber commented on JCR-2835:
----------------------------------

Ok I have committed this in the trunk, as revision 1050346

Regards,
  Serge Huber.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.2.1, 2.3.0
>
>         Attachments: DescendantSearchTest.png, JCR-2835-use-DescendantSelfAxisQuery.patch, JCR-2835_PerformanceTests.patch, JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch, SQL2DescendantSearchTest.png
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2835) Poor performance of ISDESCENDANTNODE on SQL 2 queries

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969700#action_12969700 ] 

Jukka Zitting commented on JCR-2835:
------------------------------------

See the ./test/performance directory within Jackrabbit trunk. I just added some instructions in the README.txt file for adding new tests. The simplest way to get started is to copy and adapt the SimpleSearchTest and SQL2SearchTest classes to something like DescendantSearchTest and SQL2DescendantSearchTest classes.

Re: large clauses; Yes, we'd probably need some extra code that automatically splits the queries into smaller subqueries, like is currently being done by the QueryEngine.execute() method for large joins.

> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the culprit is the constraint building, that uses recursive Lucene searches to build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.