You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Tony Brusseau (JIRA)" <ji...@apache.org> on 2009/10/23 19:38:59 UTC

[jira] Created: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Extremely slow subqueries when subquerying on strings
-----------------------------------------------------

                 Key: DERBY-4422
                 URL: https://issues.apache.org/jira/browse/DERBY-4422
             Project: Derby
          Issue Type: Bug
    Affects Versions: 10.5.3.0
         Environment: Suse Unix 10.3.
            Reporter: Tony Brusseau


SELECT DISTINCT C.INTERNAL_ID 
FROM OC_CONCEPTS C
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))

Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 

However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.

SELECT DISTINCT C.INTERNAL_ID 
FROM OC_CONCEPTS C
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777048#action_12777048 ] 

Tony Brusseau edited comment on DERBY-4422 at 11/12/09 5:01 PM:
----------------------------------------------------------------

Knut,

Thanks for looking into this. The original query that I posted is the most simplified query I could come up with that exposes the issue. Unfortunately, the real query I'm trying to optimize can't be simplified in the manner you described and remain logically equivalent (both the inner query and outter query make noncorrelated references to the same table,  OC_CONCEPT_STRINGS)..

Here is the query I really want to work:

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE 


      was (Author: apb):
    Knut,

Thanks for looking into this. The original query that I posted is the most simplified query I could come up with that exposes the issue. Unfortunately, the real query I'm trying to optimize can't be simplified in the manner you described and remain logically equivalent (both the inner query and outter query make noncorrelated references to CONCEPT_STRINGS)..

Here is the query I really want to work:

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE 

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: SQLFile2.sqlab.gz

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770101#action_12770101 ] 

Bryan Pendleton commented on DERBY-4422:
----------------------------------------

I may be reading the query plans wrong, but it seems like both 
query plans that you attached are for the OCS.NL_LC = 'dash' case;
I don't see a query plan for the WHERE OCS.CONCEPT_ID=1 case.


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau edited comment on DERBY-4422 at 3/5/10 7:42 PM:
--------------------------------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner (or possibly that the materialization optimization is too narrow and should be broadened). There may be a secondary issue noted by Bryan where minor changes in the query yield a dramatically less good optimization strategy.


      was (Author: apb):
    Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yield a dramatically less good optimization strategy.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865243#action_12865243 ] 

Tony Brusseau edited comment on DERBY-4422 at 5/7/10 1:32 PM:
--------------------------------------------------------------

I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down:


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE,
                C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE (C.EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.009 s, 0 error(s) occurred.

      was (Author: apb):
    I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down:


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.
  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770062#action_12770062 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

I compressed them and they shrank dramatically to 11Mb and 13Mb, however Jira has a 10Mb limit.


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865243#action_12865243 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down):


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775923#action_12775923 ] 

Bryan Pendleton commented on DERBY-4422:
----------------------------------------

Does this indicate a weakness in the Derby query-flattening algorithm? I read through
http://db.apache.org/derby/docs/10.5/tuning/ctuntransform36368.html
and was unsure whether Derby should have been able to flatten this query
automatically or not. It seems like the crucial bit is:

  "There is a uniqueness condition that ensures that the subquery does not introduce 
   any duplicates if it is flattened into the outer query block"

But I think that the DISTINCT in the *outer* query should have provided such a guarantee.

I think that the DISTINCT in the *subquery* is unnecessary, due to:
http://db.apache.org/derby/docs/10.5/tuning/ctuntransform867165.html


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770066#action_12770066 ] 

Bryan Pendleton commented on DERBY-4422:
----------------------------------------

Are they SQL text files? Can you break each file in half, into a Part-1 and Part-2, and
then zip each one up separately (thus maybe resulting in 4 7Mb files, e.g.)?


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770643#action_12770643 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

(SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: derby.log, fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Pendleton updated DERBY-4422:
-----------------------------------

    Attachment: goodqp.txt
                badqp.txt

Tony, thanks for posting all the information; it is very helpful!

I took a quick look at the query plans in the derby.log which you posted.

The query plan for CONCEPT_ID=1 is *dramatically* different than the query plan for NL_LC='dash'.

In both cases, the overall structure of the query plan is to perform an "exists" join in which
the outer table is OC_CONCEPTS and the inner table in OC_CONCEPT_STRINGS.

But in the CONCEPT_ID=1 case, Derby chooses a HASH EXISTS JOIN, with a
HASH SCAN RESULT SET used for the OC_CONCEPT_STRINGS inner table.

While in the NL_LC='dash' case, Derby choose a NESTED LOOP EXISTS JOIN, with
a RESTRICTION node wrapped around an INDEX ROW TO BASE ROW result set.

In either case, Derby processes all of the 150,050 rows in OC_CONCEPTS by scanning that table,
then checks each row against OC_CONCEPT_STRINGS to see if it qualifies.

But in the HASH case, that check against the inner table is extremely efficient, while in
the index-lookup case, the check is dramatically more expensive.

I attached the two query plan sections from derby.log which I studied as 'badqp.txt' and 'goodqp.txt'.

I don't know why Derby picks such a dreadful query plan in the one case, and such a
superior plan in the other case.

It's interesting that *neither* plan matches the plan that you expected: namely, to
perform the entire non-correlated inner DISTINCT query first, then use those results
to probe into the outer OC_CONCEPTS table. I don't know whether Derby considered
that plan and decided not to choose it, or in fact didn't even consider it at all. I think
that the effectiveness of that plan depends considerably on the observation that in
both the CONCEPT_ID =1 case and the NL_LC='dash' case there are only a handful
of rows in OC_CONCEPT_STRINGS which match that restriction; I'm not sure if
Derby knows that or not.

Hopefully some other people will chime in with their own observations on the
query plans that you have posted; I find them quite interesting.

Lastly, I think it may be possible that you could use optimizer overrides (--DERBY-PROPERTIES)
to control the optimizer's query plan choice, but I'm not an expert on these overrides
and hope that somebody else can comment on whether they could be used here or not.
Here's some doc on the --DERBY-PROPERTIES feature:
http://db.apache.org/derby/docs/10.5/tuning/ctunoptimzoverride.html


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770643#action_12770643 ] 

Tony Brusseau edited comment on DERBY-4422 at 10/27/09 10:24 PM:
-----------------------------------------------------------------

When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.

According to The Practical SQL Handbook, noncorrelated queries are evaluated from the inside out which should make the performance between the two schemes equivalent.


      was (Author: apb):
    When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: derby.log, fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777048#action_12777048 ] 

Tony Brusseau edited comment on DERBY-4422 at 11/12/09 4:53 PM:
----------------------------------------------------------------

Knut,

Thanks for looking into this. The original query that I posted is the most simplified query I could come up with that exposes the issue. Unfortunately, the real query I'm trying to optimize can't be simplified in the manner you described and remain logically equivalent (both the inner query and outter query make noncorrelated references to CONCEPT_STRINGS)..

Here is the query I really want to work:

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE 


      was (Author: apb):
    Knut,

Thanks for looking into this. The original query that I posted is the most simplified query I could come up with that exposes the issue. Unfortunately, the real query I'm trying to optimize can't be simplified in the manner you described and remain logically equivalent (both the inner query and outter query make uncorrelated references to CONCEPT_STRINGS)..

Here is the query I really want to work:

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE 

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau edited comment on DERBY-4422 at 3/5/10 6:12 PM:
--------------------------------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yield a dramatically less good optimization strategy.


      was (Author: apb):
    Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields a dramatically less good optimization strategy.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842843#action_12842843 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

According to the materialization documentation:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html

"Derby does not flatten DISTINCT subqueries."

Is that documentation wrong?


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775919#action_12775919 ] 

Knut Anders Hatlen commented on DERBY-4422:
-------------------------------------------

I downloaded the database and ran the queries, and I see the same behaviour.

One possible workaround is to rewrite the IN query to a join manually. So instead of

SELECT DISTINCT C.INTERNAL_ID
FROM OC_CONCEPTS C
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

try this equivalent query

SELECT DISTINCT C.INTERNAL_ID
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS OCS
WHERE C.INTERNAL_ID = OCS.CONCEPT_ID AND OCS.NL_LC = 'dash'

The latter query is much faster in my environment.

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769505#action_12769505 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

 Step 1: I don't think this is a compilation issue since I originally noticed the problem when executing prepared statements.


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: SQLFile1.sqlaa.gz

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, SQLFile1.sqlaa.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: derby.log

I deleted my database then recreated it using 10.5.3.0, then I ran SYSCS_UPDATE_STATISTICS. I'm still getting the extremely slow behavior. I've attached derby.log which contains the unlimited query plans when calling:

SELECT DISTINCT INTERNAL_ID 
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE CONCEPT_ID = 1))

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: derby.log, fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770643#action_12770643 ] 

Tony Brusseau edited comment on DERBY-4422 at 10/27/09 10:29 PM:
-----------------------------------------------------------------

When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.

According to The Practical SQL Handbook, noncorrelated queries are evaluated from the inside out which should make the performance between the two schemes above equivalent, however, I'm experiencing ~4 orders of magnitude difference in performance.


      was (Author: apb):
    When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.

According to The Practical SQL Handbook, noncorrelated queries are evaluated from the inside out which should make the performance between the two schemes above equivalent, however, I'm experiencing ~4 order of magnitude difference in performance.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: derby.log, fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau edited comment on DERBY-4422 at 3/5/10 5:45 PM:
--------------------------------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields dramatically different optimizations.


      was (Author: apb):
    Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau edited comment on DERBY-4422 at 3/5/10 5:54 PM:
--------------------------------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields dramatically different optimization strategy.


      was (Author: apb):
    Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields dramatically different optimizations.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842858#action_12842858 ] 

Knut Anders Hatlen commented on DERBY-4422:
-------------------------------------------

The DISTINCT keyword in the query is redundant and will be removed before flattening is considered (see the section "DISTINCT elimination in IN, ANY, and EXISTS subqueries" linked from that manual page). The rewriting logic therefore doesn't regard it as a DISTINCT subquery.

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777079#action_12777079 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

The documentation on materialization seems relevant here:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html

Since the query is uncorrelated (and if it were unflattenable) then the query should be optimized like expected.



> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842743#action_12842743 ] 

Knut Anders Hatlen commented on DERBY-4422:
-------------------------------------------

I put a break point in SubqueryNode.preprocess() and checked what happened during compilation of the query that used the IN operator. It turns out that it does actually flatten the query, but only to an exists join and not to a normal join. When I hard-coded that method to pick a normal join instead of an exists join the query ran just as fast as the manually written join above. So I guess what we need to find out is why the exists join is so much slower than the normal join in this case.

(I think the optimizer is correct here in picking an exists join instead of a normal join. My understanding from reading the tuning guide and the comments in the code is that flattening to a normal join would only work correctly if there were a unique constraint or similar on OC_CONCEPT_STRINGS.CONCEPT_ID.)

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770643#action_12770643 ] 

Tony Brusseau edited comment on DERBY-4422 at 10/27/09 9:22 PM:
----------------------------------------------------------------

When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.


      was (Author: apb):
    When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

(SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: derby.log, fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770832#action_12770832 ] 

Knut Anders Hatlen commented on DERBY-4422:
-------------------------------------------

Regarding the "expected" plan, I don't think it will be considered by Derby. Here's
what was answered in DERBY-47 to a review question about IN subqueries:

>> Does such a query generate and use the new style Multi-Probe processing?
>
> No, it does not. Multi-probe processing only occurs if the IN-list
> is solely comprised of constant and/or parameter nodes. A subquery
> is neither constant nor parameter, hence no multi-probing will
> occur.

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770565#action_12770565 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

I've added 2 sql dump files that will load the DB.

To recover the original files, you'd have to do something like:

gunzip *.gz
cat SQLFile1.sqlaa SQLFile1.sqlab > SQLFile1.sql
cat SQLFile2.sqlaa SQLFile2.sqlab SQLFile2.sqlac > SQLFile2.sql

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: SQLFile1.sqlab.gz

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: tmp2.sql

This is the DB look information

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769457#action_12769457 ] 

Knut Anders Hatlen commented on DERBY-4422:
-------------------------------------------

The procedure should be there in version 10.5.3.0, but your database was probably created with an earlier version of Derby and hasn't been upgraded to the new database format yet, I'd guess. For the purpose of this experiment, the following statements should have the same effect and refresh the index statistics:

  CALL SYSCS_UTIL.SYSCS_COMPRESS_TABLE('BLAH', 'OC_CONCEPTS', 1);
  CALL SYSCS_UTIL.SYSCS_COMPRESS_TABLE('BLAH', 'OC_CONCEPT_STRINGS', 1);

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769390#action_12769390 ] 

Knut Anders Hatlen commented on DERBY-4422:
-------------------------------------------

I think both the queries above satisfy the requirements for being flattened to an exists join (http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25868.html), which normally means that they should perform reasonably well.

If you could attach the output from dblook (at least the parts with table and index definitions for the tables involved in the queries) it will be easier for others to reproduce the problem.

Just to eliminate stale index cardinality statistics as the culprit, do you still see the problem if you first invoke the following two statements? (Replace 'APP' with the actual name of your schema.)

  CALL SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('APP', 'OC_CONCEPTS', NULL);
  CALL SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('APP', 'OC_CONCEPT_STRINGS', NULL);

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770560#action_12770560 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

Yeah the query plans that I added were for the real query that I'm trying to optimize. 

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

I'll add query plans for the queries I mentioned in the original report (which are simplifications of this query) in a bit.


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-4422:
----------------------------------

    Component/s: SQL

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: SQLFile2.sqlaa.gz

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769495#action_12769495 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

The database is based on open source data, so I have no issues with uploading it. Is it OK to upload 2 sql files that total about ~160Mb?

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau edited comment on DERBY-4422 at 3/5/10 7:43 PM:
--------------------------------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner (or possibly that the materialization optimization is too narrow and should be broadened to apply to a larger subset of queries). There may be a secondary issue noted by Bryan where minor changes in the query yield a dramatically less good optimization strategy.


      was (Author: apb):
    Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner (or possibly that the materialization optimization is too narrow and should be broadened). There may be a secondary issue noted by Bryan where minor changes in the query yield a dramatically less good optimization strategy.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770643#action_12770643 ] 

Tony Brusseau edited comment on DERBY-4422 at 10/27/09 10:28 PM:
-----------------------------------------------------------------

When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.

According to The Practical SQL Handbook, noncorrelated queries are evaluated from the inside out which should make the performance between the two schemes above equivalent, however, I'm experiencing ~4 order of magnitude difference in performance.


      was (Author: apb):
    When the problematic query is separated into 2 separate queries, each separate query runs extremely fast:

SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (23146, 97675, 105192, 127143))

However, when asked in subquery form:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash')) 

the query takes over 6 seconds to run.

According to The Practical SQL Handbook, noncorrelated queries are evaluated from the inside out which should make the performance between the two schemes equivalent.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: derby.log, fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777048#action_12777048 ] 

Tony Brusseau edited comment on DERBY-4422 at 11/12/09 4:51 PM:
----------------------------------------------------------------

Knut,

Thanks for looking into this. The original query that I posted is the most simplified query I could come up with that exposes the issue. Unfortunately, the real query I'm trying to optimize can't be simplified in the manner you described and remain logically equivalent (both the inner query and outter query make uncorrelated references to CONCEPT_STRINGS)..

Here is the query I really want to work:

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE 


      was (Author: apb):
    Knut,

Thanks for looking into this. Unfortunately, the query that is giving the problem is the most simplified query I could come up with that exposes the issue. Unfortunately, the real query I'm trying to optimize can't be simplified in the manner you described and remain logically equivalent (both the inner query and outter query make uncorrelated references to CONCEPT_STRINGS)..

Here is the query I really want to work:

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE 

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777048#action_12777048 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

Knut,

Thanks for looking into this. Unfortunately, the query that is giving the problem is the most simplified query I could come up with that exposes the issue. Unfortunately, the real query I'm trying to optimize can't be simplified in the manner you described and remain logically equivalent (both the inner query and outter query make uncorrelated references to CONCEPT_STRINGS)..

Here is the query I really want to work:

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL,
   C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE,
   C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE
   (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC like 'dash%'))
   AND
   (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE 


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769474#action_12769474 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

Those new commands worked. They ran for a minute or so and completed normally. No difference with performance.

I also realized that I was using Derby 10.4.2.1 (the version that comes with JDK 1.6u16) rather than Derby 10.5.3.0 like I claimed in the original bug report. I did run it in 10.5.3.0 and got the same poor performance.





> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769442#action_12769442 ] 

Tony Brusseau commented on DERBY-4422:
--------------------------------------

Not sure why these queries would be flattened. The inner queries are not correlated with the outter queries. One would think that the inner query would be run first, then the outter query would be run using the results of the first query.

I tried calling:
CALL SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('BLAH', 'OC_CONCEPTS', NULL);
and get the message:
Error code -1, SQL state 42Y03: 'SYSCS_UTIL.SYSCS_UPDATE_STATISTICS' is not recognized as a function or procedure.


PS I'm not a DB expert feel free to ignore me if what I say doesn't make sense 8-).



> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865243#action_12865243 ] 

Tony Brusseau edited comment on DERBY-4422 at 5/7/10 1:55 PM:
--------------------------------------------------------------

I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down:


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID 
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID 
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE,
                C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE (C.EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.009 s, 0 error(s) occurred.

      was (Author: apb):
    I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down:


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE,
                C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE (C.EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.009 s, 0 error(s) occurred.
  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865243#action_12865243 ] 

Tony Brusseau edited comment on DERBY-4422 at 5/7/10 2:06 PM:
--------------------------------------------------------------

I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down:


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID 
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID 
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID 
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE,
                C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE (C.EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.009 s, 0 error(s) occurred.

      was (Author: apb):
    I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down:


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID 
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID 
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE,
                C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID
FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS
WHERE (C.EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.009 s, 0 error(s) occurred.
  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: unlimited-time-derby.log
                fixed-time-derby.log

I've attached the query plans.

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tony Brusseau updated DERBY-4422:
---------------------------------

    Attachment: SQLFile2.sqlac.gz

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: fixed-time-derby.log, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775939#action_12775939 ] 

Knut Anders Hatlen commented on DERBY-4422:
-------------------------------------------

Hi Bryan,

The condition you quoted was for flattening into a normal join. There is also an option to flatten into an exists join (http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25868.html) which does not have that condition and should apply if the outer DISTINCT does not provide that guarantee.

I'm not sure, though, if the guarantee provided by the outer DISTINCT is what the manual talks about, or if it actually requires that the sub-query does not return duplicates, even with the inner DISTINCT removed.

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau edited comment on DERBY-4422 at 3/5/10 6:09 PM:
--------------------------------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields a dramatically different optimization strategy.


      was (Author: apb):
    Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields dramatically different optimization strategy.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769673#action_12769673 ] 

Bryan Pendleton commented on DERBY-4422:
----------------------------------------

160 Mb sounds pretty big; I think Jira has some sort of file size limit.

If you zip up the SQL, does it compress down pretty well? Maybe you can
succeed in attaching the files as zipped attachments?


> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841949#action_12841949 ] 

Tony Brusseau edited comment on DERBY-4422 at 3/5/10 6:11 PM:
--------------------------------------------------------------

Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields a dramatically less good optimization strategy.


      was (Author: apb):
    Per the documentation on materialization:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html 

I rewrote the query from:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS
WHERE
   (INTERNAL_ID IN (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash'))

and transformed it into:

SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS, (SELECT DISTINCT CONCEPT_ID FROM OC_CONCEPT_STRINGS WHERE NL_LC = 'dash') X
WHERE (OC_CONCEPTS.INTERNAL_ID = X.CONCEPT_ID)

Now the query runs blindingly fast.  I also made a similar change to the bigger query that I want to do, and got the same dramatic speedup. I guess the real bug here is that the optimizer is not automatically optimizing the queries in such a manner. There may be a secondary issue noted by Bryan where minor changes in the query yields a dramatically different optimization strategy.

  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865243#action_12865243 ] 

Tony Brusseau edited comment on DERBY-4422 at 5/7/10 1:24 PM:
--------------------------------------------------------------

I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down:


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.

      was (Author: apb):
    I've discovered another somewhat strange performance issue with a related query where having more than one string in an "IN" clause causes orders of magnitude slow down):


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.

SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 0.003 s, 0 error(s) occurred.


SELECT DISTINCT C.EXTERNAL_ID, C.OWL_NL_ID, CYCL, C.PREFERRED_NL, C.PREFERRED_NL_SORTABLE, C.COMMENT, C.CONCEPT_TYPE, CS.NL, CS.STRING_TYPE, C.INTERNAL_ID FROM OC_CONCEPTS C, OC_CONCEPT_STRINGS CS, (SELECT DISTINCT INTERNAL_ID
FROM OC_CONCEPTS WHERE EXTERNAL_ID IN (
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71YxJScKRGxna3DeWNvcnAeK71Y0qCcKRGxna3DeWNvcnA',
  'Mw2CHiu-uEr4nCkRsZ2tw3ljb3JwDYMeK71Y5HacKRGxna3DeWNvcnAeK71Y6VGcKRGxna3DeWNvcnAeK71Y7BKcKRGxna3DeWNvcnA'
)) X
WHERE (C.INTERNAL_ID = X.INTERNAL_ID) AND (C.INTERNAL_ID = CS.CONCEPT_ID)
ORDER BY C.PREFERRED_NL_SORTABLE, C.INTERNAL_ID, CS.STRING_TYPE

==> Execution finished after 2.37 s, 0 error(s) occurred.
  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Tony Brusseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777079#action_12777079 ] 

Tony Brusseau edited comment on DERBY-4422 at 11/12/09 9:11 PM:
----------------------------------------------------------------

The documentation on materialization seems like it might be relevant here:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html

Since the query is uncorrelated (and if it were unflattenable) then the query should be optimized like "expected".



      was (Author: apb):
    The documentation on materialization seems relevant here:

http://db.apache.org/derby/docs/10.5/tuning/ctuntransform25857.html

Since the query is uncorrelated (and if it were unflattenable) then the query should be optimized like expected.


  
> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: badqp.txt, derby.log, fixed-time-derby.log, goodqp.txt, SQLFile1.sqlaa.gz, SQLFile1.sqlab.gz, SQLFile2.sqlaa.gz, SQLFile2.sqlab.gz, SQLFile2.sqlac.gz, tmp2.sql, unlimited-time-derby.log
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4422) Extremely slow subqueries when subquerying on strings

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769491#action_12769491 ] 

Bryan Pendleton commented on DERBY-4422:
----------------------------------------

Here are some general instructions for gathering more information for helping
us understand the performance behavior:
http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

If you could work through these, and attach the information and results that you find,
it would be very helpful.

Also: any chance you can contribute a complete standalone test case that demonstrates the problem?

> Extremely slow subqueries when subquerying on strings
> -----------------------------------------------------
>
>                 Key: DERBY-4422
>                 URL: https://issues.apache.org/jira/browse/DERBY-4422
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.5.3.0
>         Environment: Suse Unix 10.3.
>            Reporter: Tony Brusseau
>         Attachments: tmp2.sql
>
>
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.CONCEPT_ID=1))
> Both queries above are trivial and they are both acting on fully indexed primary key columns. This query runs just fine. 
> However, when I make the small modification of searching on an indexed text column in the sub query, it takes over 6 seconds to run, even though both queries independently take < 1ms run.
> SELECT DISTINCT C.INTERNAL_ID 
> FROM OC_CONCEPTS C
> WHERE
>    (C.INTERNAL_ID IN (SELECT DISTINCT OCS.CONCEPT_ID FROM OC_CONCEPT_STRINGS OCS WHERE OCS.NL_LC = 'dash'))

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.