You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (Created) (JIRA)" <ji...@apache.org> on 2011/10/26 18:41:32 UTC

[jira] [Created] (CONNECTORS-284) HSQLDB load test runs out of memory and has lots of very slow queries

HSQLDB load test runs out of memory and has lots of very slow queries
---------------------------------------------------------------------

                 Key: CONNECTORS-284
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-284
             Project: ManifoldCF
          Issue Type: Bug
          Components: Framework core
    Affects Versions: ManifoldCF 0.4
            Reporter: Karl Wright
            Assignee: Karl Wright
             Fix For: ManifoldCF 0.4


Some of the long-running queries are as follows:

- document stuffer query
- locating carrydown information



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-284) HSQLDB load test runs out of memory and has lots of very slow queries

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159768#comment-13159768 ] 

Karl Wright commented on CONNECTORS-284:
----------------------------------------

The HSQLDB team has promised me a switch to preferentially read from an index when it's used to order the result.  I'm waiting on the fix to complete this ticket.

                
> HSQLDB load test runs out of memory and has lots of very slow queries
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-284
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-284
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework core
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> Some of the long-running queries are as follows:
> - document stuffer query
> - locating carrydown information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-284) HSQLDB load test runs out of memory and has lots of very slow queries

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140182#comment-13140182 ] 

Karl Wright commented on CONNECTORS-284:
----------------------------------------

The inequality issue is fixed.  I've therefore just completed a comprehensive review of all indexes and how they are used, and made some adjustments: r1195471.

                
> HSQLDB load test runs out of memory and has lots of very slow queries
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-284
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-284
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework core
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> Some of the long-running queries are as follows:
> - document stuffer query
> - locating carrydown information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CONNECTORS-284) HSQLDB load test runs out of memory and has lots of very slow queries

Posted by "Karl Wright (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CONNECTORS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright resolved CONNECTORS-284.
------------------------------------

    Resolution: Fixed

r1210307.

                
> HSQLDB load test runs out of memory and has lots of very slow queries
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-284
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-284
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework core
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> Some of the long-running queries are as follows:
> - document stuffer query
> - locating carrydown information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-284) HSQLDB load test runs out of memory and has lots of very slow queries

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142390#comment-13142390 ] 

Karl Wright commented on CONNECTORS-284:
----------------------------------------

Everything looks good for HSQLDB except for two things:

(1) The stuffer query does not read preferentially out of the order-by index, and there seems to be no good way to do that with HSQLDB.  Rule is that you must mention the order-by constraint in the WHERE clause for this to happen.  This means that you have to say something like "docpriority > 0".  The recommendation is thus to use a nested query (SELECT * from jobqueue where docpriority > 0 AND EXISTS(...) ORDER BY docpriority ASC) to force the read from the ordering index.  But since the state of document is not taken into account at this level it is possible that a lot of work would need to be done in pathological cases, unless I could guarantee that a doc priority only appeared at all if the document was in the right state.  I also don't want to break Postgresql, which I believe is capable of working with indexes of the form (ordering clauses, where clauses).

(2) The rules as far as the index matching are concerned in HSQLDB give precedence to non-parenthesized expressions over parenthesized ones, eg. (A or B or C) AND E will use an index on E preferentially.  I have to find a way to prevent this from messing up some of our queries.

                
> HSQLDB load test runs out of memory and has lots of very slow queries
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-284
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-284
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework core
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> Some of the long-running queries are as follows:
> - document stuffer query
> - locating carrydown information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-284) HSQLDB load test runs out of memory and has lots of very slow queries

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137773#comment-13137773 ] 

Karl Wright commented on CONNECTORS-284:
----------------------------------------

After much back-and-forth with the HSQLDB team, it appears that there are several issues.

First issue is that indexes do not match inequalities in the WHERE clause.  The HSQLDB team has promised a fix for this.

Second issue is that HSQLDB will not read rows from the index except as a last resort.  This is important for the stuffer query.  While there are ways to force it to use the index, these require adding a dummy clause to the WHERE which mentions the ORDER BY column.  For doc priority, the dummy clause must be an inequality, but inequalities terminate all index matching, so there's no possibility of just folding the docpriority into the existing index and expecting reasonable performance in most cases.  Until this is ironed out, HSQLDB remains too limited to recommend for general crawling.  The HSQLDB team has not offered an enhancement for this problem as of this time.


                
> HSQLDB load test runs out of memory and has lots of very slow queries
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-284
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-284
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework core
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> Some of the long-running queries are as follows:
> - document stuffer query
> - locating carrydown information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-284) HSQLDB load test runs out of memory and has lots of very slow queries

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136112#comment-13136112 ] 

Karl Wright commented on CONNECTORS-284:
----------------------------------------

I created a branch (CONNECTORS-284) that has some changes, most notably strict adherence to index column order in AND clauses.  I'm still getting some long running queries though:

{code}
Found a query that took more than a minute (66375 ms): [SELECT * FROM agents]

Found a query that took more than a minute (97358 ms): [SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE EXISTS(SELECT 'x' FROM jobs t1 WHERE t0.jobid=t1.id AND t1.status IN(?,?) AND t1.priority=?) AND t0.checkaction=? AND t0.checktime<=? AND t0.status IN (?,?) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t0.dochash=t2.dochash AND t0.jobid!=t2.jobid AND t2.status IN (?,?,?,?,?,?)) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC LIMIT 400]
  Parameter 0: 'A'
  Parameter 1: 'a'
  Parameter 2: '5'
  Parameter 3: 'R'
  Parameter 4: '1319628631492'
  Parameter 5: 'P'
  Parameter 6: 'G'
  Parameter 7: 'A'
  Parameter 8: 'F'
  Parameter 9: 'a'
  Parameter 10: 'f'
  Parameter 11: 'D'
  Parameter 12: 'd'

Found a query that took more than a minute (92926 ms): [SELECT jobid,CAST(COUNT(dochash) AS BIGINT) AS doccount FROM jobqueue t1 WHERE EXISTS(SELECT 'x' FROM jobs t0 WHERE t0.id=t1.jobid AND id=?) GROUP BY jobid]
  Parameter 0: '1319627670575'

Found a query that took more than a minute (92248 ms): [SELECT t0.id,t0.description,t0.status,t0.starttime,t0.endtime,t0.errortext FROM jobs t0  WHERE id=? ORDER BY description ASC]
  Parameter 0: '1319627670575'
  
Found a query that took more than a minute (62997 ms): [SELECT t0.id,t0.dochash,t0.docid FROM jobqueue t0 WHERE t0.jobid=? AND EXISTS(SELECT 'x' FROM carrydown t1 WHERE t0.jobid=t1.jobid AND parentidhash IN (?) AND t1.childidhash=t0.dochash AND t1.isnew=?)]
  Parameter 0: '1319627670575'
  Parameter 1: '0B75C77A4A188D383A1BCD0E138182E37AEF8AE8'
  Parameter 2: 'B'
{code}

Only a couple of these make even a glimmer of sense; the reasons for the others being slow are completely opaque.  I've asked the HSQLDB team for their opinion.

                
> HSQLDB load test runs out of memory and has lots of very slow queries
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-284
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-284
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework core
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> Some of the long-running queries are as follows:
> - document stuffer query
> - locating carrydown information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira