You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by nigelm <ni...@gmail.com> on 2006/09/03 16:00:42 UTC

Large IN clause performance

Hello

I have a dataset that's too big to be viable for HSQLDB, so I loaded it into
derby instead.

The existing code makes a lot of use of SQL "IN" clauses, where the term can
often contains thousands of items.

This performs very badly on 10.1.3.1 - because it seems to ignore indexes
and just use the min and max terms to do a table scan - this isn't going to
be good as there's millions of rows in there. In fact, even if the IN clause
contains only a handful of entries it goes quite slowly.

Is this something that's addressed in a later version (should I bump to a
pre-release), or am I stuck with either re-writing to get better
performance?


-- 
View this message in context: http://www.nabble.com/Large-IN-clause-performance-tf2210872.html#a6122846
Sent from the Apache Derby Users forum at Nabble.com.


Re: Large IN clause performance

Posted by Bryan Pendleton <bp...@amberpoint.com>.
> The existing code makes a lot of use of SQL "IN" clauses, where the term can
> often contains thousands of items.
> 
> This performs very badly on 10.1.3.1 - because it seems to ignore indexes
> and just use the min and max terms to do a table scan

I think this is DERBY-47, also logged as DERBY-713:

http://issues.apache.org/jira/browse/DERBY-47
http://issues.apache.org/jira/browse/DERBY-713

The JIRA issues have some discussion of possible alternate strategies and ways
to rewrite queries to get better performance, but I don't think there's any
easy solution available right now, sorry.

We'd love to have help in this area; perhaps you can consider helping to work
on possible implementations?

thanks,

bryan