You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Stan <st...@stanfordalumni.org> on 2005/07/05 19:16:10 UTC

Does Derby optimize queries w/ ORDER BY and "LIMIT"? (w/ indexes?)

Does Derby optimize queries that use "ORDER BY" and "setMaxRows()" when
indexes are present?

My situation: I have an existing Derby table listing 2 million cities,
with the name, latitude, longitude, and population of each city. I want to
efficiently find the 50 most populous cities between (for example)
latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9. The query is simple:

Statement s =
DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
            
s.setMaxRows(50);
            
rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and 
lon>19.8 and lon<27.9 ORDER BY population desc");

There are indexes on lat, lon, and population, but the query seems to take
a long time. In particular, setting "s.setMaxRows(50)" doesn't seem to
speed things up at all. It looks like Derby finds ALL the cities in the
specified latitude/longitude range (over 50,000 of them), and that
setMaxRows() just limits how many rows it shows me, not how many rows it
computes.

I know that MySQL optimizes queries with LIMITs -- does Derby do the same?

At the risk of offending, is there a better way to do what I want (by
creating my own data structures for example) without necessarily using
Derby?

(In case anyone's interested, I'm trying to update a map, so finding the
biggest cities in a given area quickly is important)

Re: Does Derby optimize queries w/ ORDER BY and "LIMIT"? (w/ indexes?)

Posted by Mike Matrigali <mi...@sbcglobal.net>.

having said all that, i don't know if the optimizer will pick
the population index - since it will think it has to scan all rows
of that index to figure out what rows meet the lat/long conditions.

As you point out knowing that only the top 50 rows are necessary is
key to costing the plan.

Mike Matrigali wrote:

> I'll start by saying that I am not an optimizer expert.  I don't think
> the optimizer does anything special with limits - but I could be wrong,
> maybe someone else with more knowledge in that area can let us know.
> 
> Having said that, the optimizer definitely will try to eliminate sorts
> necessary by order by if there are existing indexes which it can use
> to get the necessary order - in the case below it needs a descending
> index on the population column in cities.  Do you have such an index?
> 
> I think without such an index there is no choice in derby but to find
> all 50,000
> rows so that it can determine the 50 most populous, which meet your
> criteria. The Derby optimizer does not consider creating intermediate
> sort nodes in the query plan.  Also the sorter is not optimized for
> limited result sets, it
> tends to randomly create merge buckets and then sort those merge buckets
> and finally merge all the merge buckets - that is about the opposite
> of what you want for a limited result.  You rather have buckets for
> ranges and then once you had 50 more than a given value you could throw
> away all those bigger/less than the value.
> 
> The limit function as you say I believe mostly just cuts off the rows,
> it doesn't affect the processing much.  When possible derby trys to
> stream rows back to the user rather than processing them all before
> returning the first row - but an order by without a supporting index is
> a case where all the results of the query will be determined and then
> as the final step they are thrown into the sorter before returning
> results to the caller.  Derby query execution does not add intermediate
> sort nodes.
> 
> Stan wrote:
> 
> 
>>Does Derby optimize queries that use "ORDER BY" and "setMaxRows()" when
>>indexes are present?
>>
>>My situation: I have an existing Derby table listing 2 million cities,
>>with the name, latitude, longitude, and population of each city. I want to
>>efficiently find the 50 most populous cities between (for example)
>>latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9. The query is simple:
>>
>>Statement s =
>>DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
>>            
>>s.setMaxRows(50);
>>            
>>rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and 
>>lon>19.8 and lon<27.9 ORDER BY population desc");
>>
>>There are indexes on lat, lon, and population, but the query seems to take
>>a long time. In particular, setting "s.setMaxRows(50)" doesn't seem to
>>speed things up at all. It looks like Derby finds ALL the cities in the
>>specified latitude/longitude range (over 50,000 of them), and that
>>setMaxRows() just limits how many rows it shows me, not how many rows it
>>computes.
>>
>>I know that MySQL optimizes queries with LIMITs -- does Derby do the same?
>>
>>At the risk of offending, is there a better way to do what I want (by
>>creating my own data structures for example) without necessarily using
>>Derby?
>>
>>(In case anyone's interested, I'm trying to update a map, so finding the
>>biggest cities in a given area quickly is important)
>>
>>
> 
>

Re: Does Derby optimize queries w/ ORDER BY and "LIMIT"? (w/ indexes?)

Posted by Satheesh Bandaram <sa...@Sourcery.Org>.

As far as I know, Derby doesn't optimize based on maxRows. This is only
used to limit rows returned to clients, as you mentioned below.

Satheesh

Mike Matrigali wrote:

>I'll start by saying that I am not an optimizer expert.  I don't think
>the optimizer does anything special with limits - but I could be wrong,
>maybe someone else with more knowledge in that area can let us know.
>  
>
>  
>

Re: Does Derby optimize queries w/ ORDER BY and "LIMIT"? (w/ indexes?)

Posted by Mike Matrigali <mi...@sbcglobal.net>.

I'll start by saying that I am not an optimizer expert.  I don't think
the optimizer does anything special with limits - but I could be wrong,
maybe someone else with more knowledge in that area can let us know.

Having said that, the optimizer definitely will try to eliminate sorts
necessary by order by if there are existing indexes which it can use
to get the necessary order - in the case below it needs a descending
index on the population column in cities.  Do you have such an index?

I think without such an index there is no choice in derby but to find
all 50,000
rows so that it can determine the 50 most populous, which meet your
criteria. The Derby optimizer does not consider creating intermediate
sort nodes in the query plan.  Also the sorter is not optimized for
limited result sets, it
tends to randomly create merge buckets and then sort those merge buckets
and finally merge all the merge buckets - that is about the opposite
of what you want for a limited result.  You rather have buckets for
ranges and then once you had 50 more than a given value you could throw
away all those bigger/less than the value.

The limit function as you say I believe mostly just cuts off the rows,
it doesn't affect the processing much.  When possible derby trys to
stream rows back to the user rather than processing them all before
returning the first row - but an order by without a supporting index is
a case where all the results of the query will be determined and then
as the final step they are thrown into the sorter before returning
results to the caller.  Derby query execution does not add intermediate
sort nodes.

Stan wrote:

> Does Derby optimize queries that use "ORDER BY" and "setMaxRows()" when
> indexes are present?
> 
> My situation: I have an existing Derby table listing 2 million cities,
> with the name, latitude, longitude, and population of each city. I want to
> efficiently find the 50 most populous cities between (for example)
> latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9. The query is simple:
> 
> Statement s =
> DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
>             
> s.setMaxRows(50);
>             
> rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and 
> lon>19.8 and lon<27.9 ORDER BY population desc");
> 
> There are indexes on lat, lon, and population, but the query seems to take
> a long time. In particular, setting "s.setMaxRows(50)" doesn't seem to
> speed things up at all. It looks like Derby finds ALL the cities in the
> specified latitude/longitude range (over 50,000 of them), and that
> setMaxRows() just limits how many rows it shows me, not how many rows it
> computes.
> 
> I know that MySQL optimizes queries with LIMITs -- does Derby do the same?
> 
> At the risk of offending, is there a better way to do what I want (by
> creating my own data structures for example) without necessarily using
> Derby?
> 
> (In case anyone's interested, I'm trying to update a map, so finding the
> biggest cities in a given area quickly is important)
> 
>