You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2011/03/17 23:18:29 UTC

[jira] Created: (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Optimizer uses table scan when it could use index when multiple OR clauses
--------------------------------------------------------------------------

                 Key: DERBY-5142
                 URL: https://issues.apache.org/jira/browse/DERBY-5142
             Project: Derby
          Issue Type: Bug
            Reporter: Karl Wright


The Derby optimizer doesn't seem to recognize that a query like this:

SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)

... might be best served by using an index declared on hopcount as (jobid,linktype,parentidhash).  Other databases have no trouble with constructs like this.

This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.

I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:

SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-5142:
---------------------------------

    Attachment: repro5142.diff

Uploading a repro for this issue in the form of a Derby JUnit test. I am not yet sure if this is a bug, or a limitation in the current implementation. 

> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Store
>            Reporter: Karl Wright
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-5142:
---------------------------------

    Issue Type: Improvement  (was: Bug)

> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL, Store
>            Reporter: Karl Wright
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010738#comment-13010738 ] 

Dag H. Wanvik commented on DERBY-5142:
--------------------------------------

It seems this is a limitation of the current multi-probe scans. The OR predicates *can* be transformed in a an IN <set> representation, cf this comment on DERBY-47  which introduced this scan type:

https://issues.apache.org/jira/browse/DERBY-47?focusedCommentId=12475414&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12475414

"Yes, transformation of ORs into IN-lists occurs during preprocessing of the OR list. In OrNode.preprocess() there is logic to recognize if an OR list is transformable into an IN-list and, if so, the IN-list is created and then the "preprocess()" method of the IN-list is called. Since the creation of "probe predicates" occurs as part of IN-list preprocessing, this means that Yes, ORs are already converted to an IN-list before my new code takes effect."

The limitation here is that the multi-probe scan of an index only works for single column predicates, where the column must be the first column of an index. I'll change the issue type into "improvement".


> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Store
>            Reporter: Karl Wright
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-5142:
-----------------------------------

    Labels: derby_triage10_10  (was: )
    
> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL, Store
>            Reporter: Karl Wright
>              Labels: derby_triage10_10
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Karl Wright (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010128#comment-13010128 ] 

Karl Wright commented on DERBY-5142:
------------------------------------

I'm using 10.7.1.1 plus a patch for the DERBY-5072 issue I also reported.


> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Store
>            Reporter: Karl Wright
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Karl Wright (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright updated DERBY-5142:
-------------------------------

    Description: 
The Derby optimizer doesn't seem to recognize that a query like this:

SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)

... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.

This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.

I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:

SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 



  was:
The Derby optimizer doesn't seem to recognize that a query like this:

SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)

... might be best served by using an index declared on hopcount as (jobid,linktype,parentidhash).  Other databases have no trouble with constructs like this.

This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.

I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:

SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 




> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Bug
>            Reporter: Karl Wright
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor updated DERBY-5142:
-----------------------------------

    Urgency: Normal
    
> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL, Store
>            Reporter: Karl Wright
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-5142:
---------------------------------

           Component/s: Store
                        SQL
      Issue & fix info: [Repro attached]
    Bug behavior facts: [Performance, Seen in production]  (was: [Performance])

> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Store
>            Reporter: Karl Wright
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-5142) Optimizer uses table scan when it could use index when multiple OR clauses

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-5142:
---------------------------------


I can reproduce it on trunk. Btw, what version of Derby are you using?

> Optimizer uses table scan when it could use index when multiple OR clauses
> --------------------------------------------------------------------------
>
>                 Key: DERBY-5142
>                 URL: https://issues.apache.org/jira/browse/DERBY-5142
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Store
>            Reporter: Karl Wright
>         Attachments: repro5142.diff
>
>
> The Derby optimizer doesn't seem to recognize that a query like this:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) OR (jobid=? AND linktype=? AND parentidhash=?)
> ... might be best planned by using an index declared on hopcount as (jobid,linktype,parentidhash).  Instead, a table scan is always used, no matter how big the table.  Other databases have no trouble with constructs like this.
> This is a very common situation, and blocks Apache ManifoldCF from using Derby as its primary database choice.
> I've verified that the index IS successfully used with the same table statistics when the query has only ONE clause, e.g.:
> SELECT id,distance,linktype FROM hopcount WHERE (jobid=? AND linktype=? AND parentidhash=?) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira