You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Harri Pesonen <fu...@nic.fi> on 2006/01/26 19:35:27 UTC

A few questions about index usage

I have been testing Derby for a few days. I noticed a couple of odd 
things. Table definition is

create table Customer (
    Id bigint generated by default as identity primary key,
    Name varchar(128),
    Modified timestamp default current_timestamp
    )

There are no extra indexes, but Id gets automatically an index. I have 
inserted a million rows.

1) "SELECT MIN(Id) FROM Customer" and "SELECT MAX(Id) FROM Customer" are 
both fast, but "SELECT MIN(Id), MAX(Id) FROM Customer" is slow, taking 5 
seconds. Why?

2) "SELECT * FROM Customer ORDER BY Id" is fast but "SELECT * FROM 
Customer ORDER BY Id DESC" is slow. Why? Can't it scroll the index 
backwards?

I am wondering if Derby is being improved in this area, and if I could 
help. I have developed database systems in the past.

Thanks, Harri


Re: A few questions about index usage

Posted by Rajesh Kartha <ka...@gmail.com>.
HI Harri,

Allowing reverse scans in Derby  would be a nice enhancement. In addition,
the CREATE INDEX statement
could be modified to provide the 'ALLOW/DISALLOW REVERSE SCANS'  option
during index creation.

You may begin by logging an 'Enhancement' issue in JIRA and
proposing/discussing your ideas
to achieve this, on the derby-dev mailing list.

The Derby wiki has some useful information if you plan to get involved.

http://wiki.apache.org/db-derby/

-Rajesh



On 1/26/06, Jeffrey Lichtman <sw...@rcn.com> wrote:
>
>
> >1) "SELECT MIN(Id) FROM Customer" and "SELECT MAX(Id) FROM Customer"
> >are both fast, but "SELECT MIN(Id), MAX(Id) FROM Customer" is slow,
> >taking 5 seconds. Why?
>
> Derby knows how to use an index to quickly find a minimum or a
> maximum (by traversing down one side of the B-tree or the other). It
> doesn't know how to do both in the same query, which would take two
> traversals.
>
> >2) "SELECT * FROM Customer ORDER BY Id" is fast but "SELECT * FROM
> >Customer ORDER BY Id DESC" is slow. Why? Can't it scroll the index
> backwards?
>
> Nope - Derby doesn't do backwards scrolling. I think the physical
> data layout could support it (i.e. there are backwards pointers), but
> backwards scrolling has never been implemented. Could someone more
> familiar with the store confirm this (Mike?).
>
>
>                         -        Jeff Lichtman
>                                  swazoo@rcn.com
>                                  Check out Swazoo Koolak's Web Jukebox at
>                                  http://swazoo.com/
>
>

Re: A few questions about index usage

Posted by Jeffrey Lichtman <sw...@rcn.com>.
>>Derby knows how to use an index to quickly find a minimum or a 
>>maximum (by traversing down one side of the B-tree or the other). 
>>It doesn't know how to do both in the same query, which would take 
>>two traversals.
>
>Is the work to fix this the same as making IN list use multiple probes,
>and/or makeing OR lists do multiple probes?  There are existing JIRA
>items for those, or is it different enough to have a separate JIRA?

I believe they're different. While each case would use multiple 
probes, in the MIN/MAX case one of the probes would go down the right 
side of the BTree, while in the IN/OR case it would do "normal" 
scans. Also, the MIN/MAX case and the IN/OR case would require 
different logic to recognize when the optimizations are possible. The 
costing logic in the optimizer would be different, too. It's possible 
the two cases could share some execution code, but the rest of it 
would require different implementations.


                        -        Jeff Lichtman
                                 swazoo@rcn.com
                                 Check out Swazoo Koolak's Web Jukebox at
                                 http://swazoo.com/ 


Re: A few questions about index usage

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Jeffrey Lichtman wrote:
> 
>> 1) "SELECT MIN(Id) FROM Customer" and "SELECT MAX(Id) FROM Customer" 
>> are both fast, but "SELECT MIN(Id), MAX(Id) FROM Customer" is slow, 
>> taking 5 seconds. Why?
> 
> 
> Derby knows how to use an index to quickly find a minimum or a maximum 
> (by traversing down one side of the B-tree or the other). It doesn't 
> know how to do both in the same query, which would take two traversals.

Is the work to fix this the same as making IN list use multiple probes,
and/or makeing OR lists do multiple probes?  There are existing JIRA
items for those, or is it different enough to have a separate JIRA?
> 
>> 2) "SELECT * FROM Customer ORDER BY Id" is fast but "SELECT * FROM 
>> Customer ORDER BY Id DESC" is slow. Why? Can't it scroll the index 
>> backwards?
The structure could support this, the work just has not been done.  With
row level locking, concurrent tree splitting, and current assumptions
throughout the access method that scans go top/down, left to right the
work to do a backward scan is harder than doing a forward scan.  Also
once the store portion is done, there would be necessary changes to:
optimizer, execution engine, and scan interface to allow the backward
scan.  This would be a hard first project.

The current workaround is that you can create both an ascending and
a descending index if you need ordering in both directions to go fast.

You should check out the access/btree white papers Dibyendo (sp?) 
produced, on the web site.

I will go ahead and log an enhancement in JIRA (DERBY-884)
> 
> 
> Nope - Derby doesn't do backwards scrolling. I think the physical data 
> layout could support it (i.e. there are backwards pointers), but 
> backwards scrolling has never been implemented. Could someone more 
> familiar with the store confirm this (Mike?).
> 
> 
>                        -        Jeff Lichtman
>                                 swazoo@rcn.com
>                                 Check out Swazoo Koolak's Web Jukebox at
>                                 http://swazoo.com/
> 
> 


Re: A few questions about index usage

Posted by Jeffrey Lichtman <sw...@rcn.com>.
>1) "SELECT MIN(Id) FROM Customer" and "SELECT MAX(Id) FROM Customer" 
>are both fast, but "SELECT MIN(Id), MAX(Id) FROM Customer" is slow, 
>taking 5 seconds. Why?

Derby knows how to use an index to quickly find a minimum or a 
maximum (by traversing down one side of the B-tree or the other). It 
doesn't know how to do both in the same query, which would take two traversals.

>2) "SELECT * FROM Customer ORDER BY Id" is fast but "SELECT * FROM 
>Customer ORDER BY Id DESC" is slow. Why? Can't it scroll the index backwards?

Nope - Derby doesn't do backwards scrolling. I think the physical 
data layout could support it (i.e. there are backwards pointers), but 
backwards scrolling has never been implemented. Could someone more 
familiar with the store confirm this (Mike?).


                        -        Jeff Lichtman
                                 swazoo@rcn.com
                                 Check out Swazoo Koolak's Web Jukebox at
                                 http://swazoo.com/