You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by "arindam.bhattacharjee" <mr...@gmail.com> on 2009/04/07 13:44:20 UTC

URGENT!!! JDBC SQL query taking long time for large IN clause

Hello,

I would like my query below to return within 100 millisecs. Please help me,
and the values for the IN clause comes from outside hence cannot really
change the IN clause to a join on an existing table.

time taken to prepare statement = 0
time taken execute statement = 33375
time taken enumerate result set = 63

DDLs:

CREATE TABLE APP.OBJECT_MASTER (
    OBJECT_ID INTEGER NOT NULL,
    OBJECT_CUID VARCHAR(32) NOT NULL,
    PRIMARY KEY (OBJECT_ID)
);

CREATE TABLE APP.OBJECT_CATEGORY_MAPPING (
    OBJECT_ID INTEGER NOT NULL,
    CATEGORY_ID INTEGER NOT NULL
);

CREATE TABLE APP.CATEGORY_MASTER (
    CATEGORY_ID INTEGER NOT NULL,
    CATEGORY_NAME VARCHAR(255) NOT NULL,
    PARENT_ID INTEGER,
    PRIMARY KEY (CATEGORY_ID)
);

CREATE INDEX APP.OBJECT_MASTER_INDEX ON APP.OBJECT_MASTER
(OBJECT_ID,OBJECT_CUID)

CREATE INDEX APP.OBJECT_CATEGORY_MAPPING_INDEX ON
APP.OBJECT_CATEGORY_MAPPING (OBJECT_ID,CATEGORY_ID)

CREATE INDEX APP.CATEGORY_MASTER_INDEX ON APP.CATEGORY_MASTER
(CATEGORY_ID,CATEGORY_NAME)

Data inside the tables:
OBJECT_MASTER = 1,000,000 rows
CATEGORY_MASTER = 10,000 rows
OBJECT_CATEGORY_MAPPING = 20,000,000 rows

SYSCS_GET_RUNTIMESTATISTICS() output:


Statement Name:
    null
Statement Text:
     select category_master.category_name,
count(category_master.category_name) as category_count  from  object_master, 
category_master,  object_category_mapping  where     
object_master.object_id = object_category_mapping.object_id and      
object_category_mapping.category_id = category_master.category_id and      
object_master.object_id in (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? , 
? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?      ) 
group by      category_master.category_name  order by      category_count
desc
Parse Time: 94
Bind Time: 31
Optimize Time: 125
Generate Time: 94
Compile Time: 344
Execute Time: 33235
Begin Compilation Timestamp : 2009-04-07 16:03:55.859
End Compilation Timestamp : 2009-04-07 16:03:56.203
Begin Execution Timestamp : 2009-04-07 16:03:56.296
End Execution Timestamp : 2009-04-07 16:05:36.515
Statement Execution Plan Text:
Sort ResultSet:
Number of opens = 1
Rows input = 1068
Rows returned = 1068
Eliminate duplicates = false
In sorted order = false
Sort information:
    Number of rows input=1068
    Number of rows output=1068
    Sort type=internal
    constructor time (milliseconds) = 0
    open time (milliseconds) = 33235
    next time (milliseconds) = 0
    close time (milliseconds) = 0
    optimizer estimated row count:     18811881.00
    optimizer estimated cost:     95276340.92

Source result set:
    Project-Restrict ResultSet (10):
    Number of opens = 1
    Rows seen = 1068
    Rows filtered = 0
    restriction = false
    projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 33172
        next time (milliseconds) = 47
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count:     18811881.00
        optimizer estimated cost:     95276340.92

    Source result set:
        Grouped Aggregate ResultSet:
        Number of opens = 1
        Rows input = 19000
        Has distinct aggregate = false
        In sorted order = false
        Sort information:
            Number of merge runs=1
            Number of rows input=19000
            Number of rows output=1084
            Size of merge runs=[18220]
            Sort type=external
            constructor time (milliseconds) = 0
            open time (milliseconds) = 33172
            next time (milliseconds) = 47
            close time (milliseconds) = 0
            optimizer estimated row count:     18811881.00
            optimizer estimated cost:     95276340.92

        Source result set:
            Project-Restrict ResultSet (9):
            Number of opens = 1
            Rows seen = 19000
            Rows filtered = 0
            restriction = false
            projection = true
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 33001
                close time (milliseconds) = 16
                restriction time (milliseconds) = 0
                projection time (milliseconds) = 16
                optimizer estimated row count:     18811881.00
                optimizer estimated cost:     95276340.92

            Source result set:
                Nested Loop Exists Join ResultSet:
                Number of opens = 1
                Rows seen from the left = 19000
                Rows seen from the right = 19000
                Rows filtered = 0
                Rows returned = 19000
                    constructor time (milliseconds) = 0
                    open time (milliseconds) = 0
                    next time (milliseconds) = 32954
                    close time (milliseconds) = 16
                    optimizer estimated row count:     18811881.00
                    optimizer estimated cost:     95276340.92

                Left result set:
                    Nested Loop Join ResultSet:
                    Number of opens = 1
                    Rows seen from the left = 1000
                    Rows seen from the right = 19000
                    Rows filtered = 0
                    Rows returned = 19000
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 32209
                        close time (milliseconds) = 16
                        optimizer estimated row count:     18811881.00
                        optimizer estimated cost:      4772381.42

                    Left result set:
                        Project-Restrict ResultSet (5):
                        Number of opens = 1
                        Rows seen = 104896
                        Rows filtered = 103896
                        restriction = true
                        projection = false
                            constructor time (milliseconds) = 0
                            open time (milliseconds) = 0
                            next time (milliseconds) = 31769
                            close time (milliseconds) = 16
                            restriction time (milliseconds) = 30628
                            projection time (milliseconds) = 0
                            optimizer estimated row count:       297031.20
                            optimizer estimated cost:        63997.96

                        Source result set:
                            Index Scan ResultSet for OBJECT_MASTER using
constraint SQL090406104857810 at read committed isolation level using share
row locking chosen by the optimizer
                            Number of opens = 1
                            Rows seen = 104896
                            Rows filtered = 0
                            Fetch Size = 1
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 31
                                next time (milliseconds) = 1126
                                close time (milliseconds) = 16
                                next time in milliseconds/row = 0

                            scan information:
                                Bit set of columns fetched={0}
                                Number of columns fetched=1
                                Number of deleted rows visited=0
                                Number of pages visited=566
                                Number of rows qualified=104896
                                Number of rows visited=104897
                                Scan type=btree
                                Tree height=3
                                start position:
    >= on first 1 column(s).
    Ordered null semantics on the following columns:

                                stop position:
    > on first 1 column(s).
    Ordered null semantics on the following columns:

                                qualifiers:
None
                                optimizer estimated row count:      
297031.20
                                optimizer estimated cost:        63997.96

                    Right result set:
                        Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation level
using share row locking chosen by the optimizer
                        Number of opens = 1000
                        Rows seen = 19000
                        Rows filtered = 0
                        Fetch Size = 1
                            constructor time (milliseconds) = 0
                            open time (milliseconds) = 47
                            next time (milliseconds) = 408
                            close time (milliseconds) = 16
                            next time in milliseconds/row = 0

                        scan information:
                            Bit set of columns fetched={0, 1}
                            Number of columns fetched=2
                            Number of deleted rows visited=0
                            Number of pages visited=4119
                            Number of rows qualified=19000
                            Number of rows visited=20000
                            Scan type=btree
                            Tree height=4
                            start position:
    >= on first 1 column(s).
    Ordered null semantics on the following columns:
0
                            stop position:
    > on first 1 column(s).
    Ordered null semantics on the following columns:
0
                            qualifiers:
None
                            optimizer estimated row count:     18811881.00
                            optimizer estimated cost:      4708383.46


                Right result set:
                    Index Row to Base Row ResultSet for CATEGORY_MASTER:
                    Number of opens = 19000
                    Rows seen = 19000
                    Columns accessed from heap = {1}
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 186
                        next time (milliseconds) = 497
                        close time (milliseconds) = 0
                        optimizer estimated row count:     18811881.00
                        optimizer estimated cost:     90503959.49

                        Index Scan ResultSet for CATEGORY_MASTER using
constraint SQL090406104857680 at read committed isolation level using share
row locking chosen by the optimizer
                        Number of opens = 19000
                        Rows seen = 19000
                        Rows filtered = 0
                        Fetch Size = 1
                            constructor time (milliseconds) = 0
                            open time (milliseconds) = 186
                            next time (milliseconds) = 419
                            close time (milliseconds) = 0
                            next time in milliseconds/row = 0

                        scan information:
                            Bit set of columns fetched=All
                            Number of columns fetched=2
                            Number of deleted rows visited=0
                            Number of pages visited=38000
                            Number of rows qualified=19000
                            Number of rows visited=19000
                            Scan type=btree
                            Tree height=2
                            start position:
    >= on first 1 column(s).
    Ordered null semantics on the following columns:
0
                            stop position:
    > on first 1 column(s).
    Ordered null semantics on the following columns:
0
                            qualifiers:
None
                            optimizer estimated row count:     18811881.00
                            optimizer estimated cost:     90503959.49

Best regards,

Arindam.
-- 
View this message in context: http://www.nabble.com/URGENT%21%21%21-JDBC-SQL-query-taking-long-time-for-large-IN-clause-tp22927332p22927332.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

Strangely I have not seen any email from Mike Segel. I got a response from
Knut Anders. However, compression doesn't seem to be helping. I have tried
with 10.4 - but the initial findings are not very satisfactory. Please can
you let me know your thoughts on the following questions I sent in my
previous email:

QUOTE:
I wanted to get
your pulse about whether Derby can respond in sub 100 millisec time with the
table sizes you see above?

I find that:

select category_master.category_name, count(category_master.category_name)
as category_count
from
        (
               select internal.object_id
               from
               (
                       values(1001) union all
                       values(1001) union all
                       values(1001) union all
                       values(1001) union all
                       values(1002) union all
                       values(1001) union all
                       values(1001) union all
                       values(1001) union all
                       values(1001) union all
                       values(1001) union all
                       values(1001) union all
                       values(1001) union all .......
                       values(9999)
               ) as internal(object_id)

       ) as external_ids,
        object_master,
       category_master,
       object_category_mapping
where
        external_ids.object_id = object_master.object_id and
       external_ids.object_id = object_category_mapping.object_id and
        object_master.object_id = object_category_mapping.object_id and
        category_master.category_id = object_category_mapping.category_id
group by
       category_master.category_name
order by
       category_count desc

is much faster unfortunately connection.prepareStatement() is taking way too
much memory (both stack and heap - I have a constraint of 256 MB MAX memory
for my JVM) which goes beyond my applications resources. Is there a way I
can precompile some SQLs which are very expensive to parse during execution.
UNQUOTE:

On Tue, Apr 7, 2009 at 9:36 PM, Bryan Pendleton
<bp...@amberpoint.com>wrote:

> What version of Derby? What operating system? What version of Java?
>> [Arindam] 10.1.3.1; Windows XP; JRE 1.6
>>
>
> Definitely try using Derby 10.4.
>
> Also, Mike Segel made a bunch of other great suggestions in his mail, so
> I suggest following those and see where it gets you.
>
> thanks,
>
> bryan
>

RE: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by de...@segel.com.

Bryan,

Thanks.
Just a mini rant for a second...

KISS == Keep It Simple Stupid. This is an engineering principle that goes to
the core of the problem. If you break a complex problem down in to is base
components, you'll find that you can solve the smaller component problems
easier and then build up to solving the more complex problem. 

So before you panic, (The topic said URGENT!!!), slow down and relax. Go
back and clean up your code and *think*. If you panic, you'll jump to
conclusions and come up with a bad design.

Ok, rant over...

To answer Arindam's question from another post...

Once you open a database connection, you can prepare the SQL queries that
you want to run. As long as you have an open connection and don't drop the
connection, you should be able to run the prepared statements without
incurring additional costs in preparing the statement. So you spend the
~170ms once and not each time you execute the statement.

In my post I mention that you should consider a temp table. Now a caveat...
different databases handle temp tables differently. With Oracle and DB2,
temp tables are a pain to work with. With Informix (IBM's IDS) you can
create temp tables on the fly and they will only persist as long as you
maintain the connection. I'd have to go back and look at how Derby handles
temp tables, but a worst case scenario is that at the start of your app, you
drop the temp table (catch the exception), create the temp, and then prior
to running your query you insert your rows in to the temp table.

For your problem since you've got hundreds+ values in your IN clause, the
first question I have to ask.. are you ensuring that the values in your IN
clause are unique? That is, are you reducing your collection to a minimum?

Also I just saw your post for the results with 5K elements in the IN clause.
No shock there. Now put them in a temp table and join them as part of a sub
select. 

Arindam's question does hit on a problem with a lot of databases. How do
they handle the IN clause? 

A possible feature request could be to allow the end user to join against an
external collection. There's more to this and I'd imagine that you'd end up
with some form of VTI like Informix has.

BTW ... Knut's solution isn't simple but interesting ...

HTH 

-Mike

> -----Original Message-----
> From: Bryan Pendleton [mailto:bpendleton@amberpoint.com]
> Sent: Tuesday, April 07, 2009 11:06 AM
> To: Derby Discussion
> Subject: Re: URGENT!!! JDBC SQL query taking long time for large IN clause
> 
> > What version of Derby? What operating system? What version of Java?
> > [Arindam] 10.1.3.1; Windows XP; JRE 1.6
> 
> Definitely try using Derby 10.4.
> 
> Also, Mike Segel made a bunch of other great suggestions in his mail, so
> I suggest following those and see where it gets you.
> 
> thanks,
> 
> bryan

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Bryan Pendleton <bp...@amberpoint.com>.

> What version of Derby? What operating system? What version of Java?
> [Arindam] 10.1.3.1; Windows XP; JRE 1.6

Definitely try using Derby 10.4.

Also, Mike Segel made a bunch of other great suggestions in his mail, so
I suggest following those and see where it gets you.

thanks,

bryan

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

Yes, I think it is unusual.

Can you tell what your system is doing while this query is running? Is it
very CPU-busy? Very disk-busy? Mostly idle?
[Arindam] Nothing much. Just an idle laptop.

What version of Derby? What operating system? What version of Java?
[Arindam] 10.1.3.1; Windows XP; JRE 1.6

How large (in megabytes on disk) are the object_master table and its index?
(You can use the SPACE_TABLE system procedure, I think, to get this info.)

CONGLOMERATENAME        ISINDEX    NUMALLOCPAGES    NUMFREEPAGES
NUMUNFILLEDPAGES    PAGESIZE    ESTIMATEDPAGESAVING

CATEGORY_MASTER            0    103    0    0    4096    0
SQL090406091302600        1    55    0    0    4096    0
SQL090406091302601        1    160    0    1    4096    0
SQL090406091302730        1    1    0    1    4096    0
OBJECT_MASTER            0    10497    0    0    4096    0
SQL090406091302760        1    5340    0    1    4096    0
SQL090406091302761        1    16708    0    410    4096    0
OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0

Can you experiment with larger and smaller sets of values for the IN clause?
If you have just 50 items in the IN clause, do you get the same behavior?
[Arindam] 50 in in-clause is very fast - the whole query returns in 40
millis.

What about if you have 5,000 items in the IN clause? (Actually I'm not sure
if Derby can handle that many.)
[Arindam] I will try this and send it after a few mins.

I suspect that you'll find that it takes 30 seconds regardless of the number
of items in the IN clause, but I may be wrong.
[Arindam] Well actually that doesn't seem to be the case - anything between
1 and 50 is very fast.

Also, what about the query:

 select count(object_id) from object_master
[Arindam] 2 consecutive run:
time taken execute statement = 1421
count = 990099
time taken to get the count value = 2563
time taken execute statement = 219
count = 990099
time taken to get the count value = 2609

How long does this query take?

On Tue, Apr 7, 2009 at 8:52 PM, Bryan Pendleton
<bp...@amberpoint.com>wrote:

> query: select object_id from object_master where object_id in (? ..1000
>> times.. ?) time taken enumerate result set = 31813 (isn't this unusual?
>> enumeration of results taking so much time?)
>>
>
> Yes, I think it is unusual.
>
> Can you tell what your system is doing while this query is running? Is it
> very CPU-busy? Very disk-busy? Mostly idle?
>
> What version of Derby? What operating system? What version of Java?
>
> How large (in megabytes on disk) are the object_master table and its index?
> (You can use the SPACE_TABLE system procedure, I think, to get this info.)
>
> Can you experiment with larger and smaller sets of values for the IN
> clause?
> If you have just 50 items in the IN clause, do you get the same behavior?
> What about if you have 5,000 items in the IN clause? (Actually I'm not sure
> if Derby can handle that many.)
>
> I suspect that you'll find that it takes 30 seconds regardless of the
> number
> of items in the IN clause, but I may be wrong.
>
> Also, what about the query:
>
>  select count(object_id) from object_master
>
> How long does this query take?
>
> thanks,
>
> bryan
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

Time taken to execute 5000 INs:

time taken to prepare statement = 500
time taken execute statement = 2127703
 result set count = 5269

On Tue, Apr 7, 2009 at 8:52 PM, Bryan Pendleton
<bp...@amberpoint.com>wrote:

> query: select object_id from object_master where object_id in (? ..1000
>> times.. ?) time taken enumerate result set = 31813 (isn't this unusual?
>> enumeration of results taking so much time?)
>>
>
> Yes, I think it is unusual.
>
> Can you tell what your system is doing while this query is running? Is it
> very CPU-busy? Very disk-busy? Mostly idle?
>
> What version of Derby? What operating system? What version of Java?
>
> How large (in megabytes on disk) are the object_master table and its index?
> (You can use the SPACE_TABLE system procedure, I think, to get this info.)
>
> Can you experiment with larger and smaller sets of values for the IN
> clause?
> If you have just 50 items in the IN clause, do you get the same behavior?
> What about if you have 5,000 items in the IN clause? (Actually I'm not sure
> if Derby can handle that many.)
>
> I suspect that you'll find that it takes 30 seconds regardless of the
> number
> of items in the IN clause, but I may be wrong.
>
> Also, what about the query:
>
>  select count(object_id) from object_master
>
> How long does this query take?
>
> thanks,
>
> bryan
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Bryan Pendleton <bp...@amberpoint.com>.

> query: select object_id from object_master where object_id in (? ..1000 times.. ?) 
> time taken enumerate result set = 31813 (isn't this unusual? enumeration 
> of results taking so much time?)

Yes, I think it is unusual.

Can you tell what your system is doing while this query is running? Is it
very CPU-busy? Very disk-busy? Mostly idle?

What version of Derby? What operating system? What version of Java?

How large (in megabytes on disk) are the object_master table and its index?
(You can use the SPACE_TABLE system procedure, I think, to get this info.)

Can you experiment with larger and smaller sets of values for the IN clause?
If you have just 50 items in the IN clause, do you get the same behavior?
What about if you have 5,000 items in the IN clause? (Actually I'm not sure
if Derby can handle that many.)

I suspect that you'll find that it takes 30 seconds regardless of the number
of items in the IN clause, but I may be wrong.

Also, what about the query:

   select count(object_id) from object_master

How long does this query take?

thanks,

bryan

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

Hello Bryan,

query: select object_id from object_master where object_id in (? ..1000
times.. ?)
time taken to prepare statement = 328
time taken execute statement = 31
time taken enumerate result set = 31813 (isn't this unusual? enumeration of
results taking so much time?)

To answer your question regarding result set count for the previous query:
1069

Best regards,

Arindam

On Tue, Apr 7, 2009 at 7:54 PM, Bryan Pendleton
<bp...@amberpoint.com>wrote:

> I would like my query below to return within 100 millisecs. Please help me,
>>
>
> Here's what Derby is doing with your query:
>
> 1) It scans the object_master_index looking for the object_id values in
> your IN list. You provided 1,000 values. Derby sorts your list, finds the
> lowest
> value, and the highest value, and scans that section of the index. During
> this
> processing, Derby looks at 104,896 entries in the index, and discards
> 103,896
> of them, resulting in 1,000 index entries. This requires touching 566 index
> pages,
> which I think are read somewhat randomly, and takes 30 seconds to process:
>
> >                         Rows seen = 104896
> >                         Rows filtered = 103896
> >                             next time (milliseconds) = 31769
> >
> >                         Source result set:
> >                             Index Scan ResultSet for OBJECT_MASTER using
> >                             Rows seen = 104896
> >                                 Number of pages visited=566
>
> Honestly, I'm not quite sure why this step takes 30 seconds. This is a
> puzzle to me.
>
> 2) Derby then takes those 1,000 rows, and joins them against
> object_category_mapping
> using the object_category_mapping index. This results in 19,000 rows in the
> result,
> which I guess means that each object belongs to 19 categories? This
> requires
> performing 1,000 probes into the index, which each touch about 4 pages on
> average:
>
> >                         Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
> > using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation
> level
> >                         Number of opens = 1000
> >                         Rows seen = 19000
> >                             Number of pages visited=4119
>
> 3) The 19,000 rows resulting from the join are then sorted and grouped by
> category_name, and the category_count is computed. This requires an
> external sort, and results in 1084 category_name groups:
>
> >         Grouped Aggregate ResultSet:
> >         Rows input = 19000
> >         Sort information:
> >             Number of merge runs=1
> >             Number of rows input=19000
> >             Number of rows output=1084
> >             Size of merge runs=[18220]
> >             Sort type=external
>
> 4) lastly, the result is sorted in order to return the results ordered by
> count.
> This is an small internal sort and is very fast.
>
> > Sort ResultSet:
> > Sort information:
> >     Number of rows input=1068
> >     Number of rows output=1068
> >     Sort type=internal
>
> I'm a little confused about why the grouping appears to produce 1,084 rows,
> but the order by only sees 1068. How many rows are in your final result?
>
> This query plan, overall, seems pretty good to me. It doesn't look like
> Derby
> did anything frightenly stupid.
>
> However, I am confused as to why the initial scan of the OBJECT_MASTER for
> the
> object_id values in the IN clause took 30 seconds.
>
> What happens if you shrink your query down, and just do:
>
>  select object_id from object_master where object_id in (?,...,?)
>
> and pass it the exact same 1,000 values for the IN clause. Does that query
> by itself also take 30 seconds?
>
> thanks,
>
> bryan
>
>
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Bryan Pendleton <bp...@amberpoint.com>.

> I would like my query below to return within 100 millisecs. Please help me,

Here's what Derby is doing with your query:

1) It scans the object_master_index looking for the object_id values in
your IN list. You provided 1,000 values. Derby sorts your list, finds the lowest
value, and the highest value, and scans that section of the index. During this
processing, Derby looks at 104,896 entries in the index, and discards 103,896
of them, resulting in 1,000 index entries. This requires touching 566 index pages,
which I think are read somewhat randomly, and takes 30 seconds to process:

 >                         Rows seen = 104896
 >                         Rows filtered = 103896
 >                             next time (milliseconds) = 31769
 >
 >                         Source result set:
 >                             Index Scan ResultSet for OBJECT_MASTER using
 >                             Rows seen = 104896
 >                                 Number of pages visited=566

Honestly, I'm not quite sure why this step takes 30 seconds. This is a puzzle to me.

2) Derby then takes those 1,000 rows, and joins them against object_category_mapping
using the object_category_mapping index. This results in 19,000 rows in the result,
which I guess means that each object belongs to 19 categories? This requires
performing 1,000 probes into the index, which each touch about 4 pages on average:

 >                         Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
 > using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation level
 >                         Number of opens = 1000
 >                         Rows seen = 19000
 >                             Number of pages visited=4119

3) The 19,000 rows resulting from the join are then sorted and grouped by
category_name, and the category_count is computed. This requires an
external sort, and results in 1084 category_name groups:

 >         Grouped Aggregate ResultSet:
 >         Rows input = 19000
 >         Sort information:
 >             Number of merge runs=1
 >             Number of rows input=19000
 >             Number of rows output=1084
 >             Size of merge runs=[18220]
 >             Sort type=external

4) lastly, the result is sorted in order to return the results ordered by count.
This is an small internal sort and is very fast.

 > Sort ResultSet:
 > Sort information:
 >     Number of rows input=1068
 >     Number of rows output=1068
 >     Sort type=internal

I'm a little confused about why the grouping appears to produce 1,084 rows,
but the order by only sees 1068. How many rows are in your final result?

This query plan, overall, seems pretty good to me. It doesn't look like Derby
did anything frightenly stupid.

However, I am confused as to why the initial scan of the OBJECT_MASTER for the
object_id values in the IN clause took 30 seconds.

What happens if you shrink your query down, and just do:

   select object_id from object_master where object_id in (?,...,?)

and pass it the exact same 1,000 values for the IN clause. Does that query
by itself also take 30 seconds?

thanks,

bryan

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

t sounded like your standard app tends to load a bunch of data and then
deliver those dbs for read only access.  Again this is a minor tweak but
you may see some improvement if you can sort the data by primary key for
insert into the base table.  This will give you a one time clustering that
would improve index scans that require index to base table look up.
Thinking about this, I wonder if we should add an option to compress table
to do this automatically?  It would be very easy to do (ie. a one time
cluster - continual cluster would be a lot more work). All that would have
to happen is at compress time throw all the rows into the sorter and then
insert the result on the other side.
[Arindam] This is a great suggestion. Our DB is dynamic however only one
modification owner can write to it. The rest will see a copy. We can just
compress the master DB before we replicate it to the copies. By the way, I
tested on mysql - it is processing 1000 records in IN clause for the same DB
in 16 millis. I have increased the query page size to 16 MB. I will send the
execution plan as I get them.

On Wed, Apr 8, 2009 at 10:31 PM, Mike Matrigali <mi...@sbcglobal.net>wrote:

> Arindam Bhattacharjee wrote:
>
>> Thanks for the excellent summary Mike. The IN clause will have an unique
>> list of object ids. If we can return 100 object ids in good time, we are
>> okay. The users will be given an option to increase this to MAX 1000 -
>> however, they will be warned that the performance would be slower. The real
>> power of Derby according to us is the flexibility to manage the life cycle
>> of the database files without support from DB admins. We want to make the DB
>> management completely hidden from our customers. However, SolidDB and
>> TimesTen are good options which we can investigate but Derby has the
>> advantage of being used within our organization already. In general we liked
>> its memory footprint - except for one case of SQL parsing, it was mostly
>> well behaved.
>>
>> Question: will changing page size for the tables improve query speed? We
>> will always run one query through multiple threads on Derby and we will not
>> use it for any other purpose. We are okay to be a little slow on the
>> insertions but retrieval should be as quick and fast as possible. The
>> database will be created and replicated. The replicated databases will be
>> read-only.
>>
> I am not sure about page size but seems reasonable to try upping them all
> to 32k.  I usually leave page size to the very end of tuning.  Best is just
> to run the experiment.  You definitely have enough rows so
> you won't be wasting any space going to bigger pages.  The trade off is
> that i/o will be better but processing bigger pages requires more cpu
> for things like binary searches on the btree pages.  But bigger pages may
> also shrink the height of the btree so again it is a tradeoff.
> Another downside is that memory footprint of derby will increase as Derby
> currently sizes it's cache by number of pages so where you may
> have 1000 (default cache size is 1000 pages) 4k pages cached you now might
> get 1000 32k pages.
>
> It sounded like your standard app tends to load a bunch of data and then
> deliver those dbs for read only access.  Again this is a minor tweak but
> you may see some improvement if you can sort the data by primary key for
> insert into the base table.  This will give you a one time clustering that
> would improve index scans that require index to base table look up.
> Thinking about this, I wonder if we should add an option to compress table
> to do this automatically?  It would be very easy to do (ie. a one time
> cluster - continual cluster would be a lot more work). All that would have
> to happen is at compress time throw all the rows into the sorter and then
> insert the result on the other side.
>
>
>> Best regards,
>>
>> Arindam.
>>
>>
>> On Wed, Apr 8, 2009 at 6:23 PM, <derby@segel.com <ma...@segel.com>>
>> wrote:
>>
>>
>>    Arindam,
>>
>>
>>    Ok,
>>
>>
>>    Just a few things…
>>
>>
>>    1)       Derby is a free, open source, product. Warts and all, its
>>    not going to be your best choice for a commercial rdbms product.
>>    (Sorry Derby fans, you get what you pay for.)
>>
>>    2)       You want speed, you’re going to have to consider IBM’s
>>    SolidDB which is an in memory database along with Oracle’s TimesTen
>>    database. This will give you the speed that you want.
>>
>>    3)       You’re running on a laptop where your disk drive could be a
>>    5400 rpm IDE drive. Laptop drive == low energy consumption and low
>>    performance. Disk based solutions will be much slower on a laptop
>>    than on a ‘comparable’ desktop and/or server.
>>
>>    4)       I’m not sure why you had to load and then compress your
>>    tables. I could understand that if you created your index prior to
>>    loading the data that you could have to update the statistics.
>>
>>    5)       Since we don’t know what you’re actually trying to do,
>>    there could be a better or more efficient design. Having an IN
>>    clause with 1000’s of entries is definitely not a good design for
>>    any database.
>>
>>    6)       You never did answer the question about the object ids that
>>    were in the IN clause. Were they unique or were there duplicates?
>>
>>
>>
>>    HTH
>>
>>
>>    -Mike
>>
>>
>>  ------------------------------------------------------------------------
>>
>>    *From:* Arindam Bhattacharjee
>>    [mailto:mr.arindam.bhattacharjee@gmail.com
>>    <ma...@gmail.com>]
>>    *Sent:* Wednesday, April 08, 2009 2:14 AM
>>
>>    *To:* Derby Discussion; msegel@segel.com <ma...@segel.com>
>>
>>    *Subject:* Re: URGENT!!! JDBC SQL query taking long time for large
>>    IN clause
>>
>>
>>    Few inputs/observations:
>>
>>    a) This is not a school project. This is a project which makes money
>>    for us today. We have functioning product running today, developed
>>    by us which doesn't use any RDBMS's to achieve what I am trying to
>>    achieve with the tables. For 200,000 object master we get 45 - 60
>>    millis performance. The reason we are trying to use an RDBMS is to
>>    ensure that we can scale even higher and updates are easier and more
>>    maintenable. Hence all these questions.
>>
>>    b) I tried removing the "preserve rows" from the declare temp table
>>    clause - and that didn't help me either.
>>
>>    c) The data base which I have created is just a sample database
>>    which vaguely represents the load which we will see if we implement
>>    the existing feature of ours, using Derby RDBMS. It doesn't contain
>>    REAL data.
>>
>>    d) The entire database was created in one shot, as I have stated
>>    earlier, and records are populated in one go. That was the actual
>>    problem - thanks to Knuth for indicating that as a possibility. When
>>    I compress the tables - the quries start performing a LOT faster! 1
>>    million in object master, 20 million in object category mapping and
>>    10 K in category master and 1000 in IN clause gives me 1100 millis
>>    speed now - down from 30 seconds. That probably explains why the
>>    object master index scans were taking nearly 30 seconds in the
>>    runtimestats which I posted. Which is slow, but still manageable
>>    since the 100 and 500s are much faster. However, that is still a lot
>>    slower than other databases we are testing on. Since Derby is
>>    embedded we expected it to be much faster since no IPC is required
>>    between the client driver and the network server.
>>
>>    e) IN doesn't function properly beyond 1000 object ids, and we used
>>    a regular table CREATE TABLE DUMMY (ID INTEGER PRIMARY KEY).
>>    Insertion of 5000 records into this happened in 100 millis (after
>>    compression of the tables) and the query took 4 seconds. With
>>    executeUpdate() or execute() (as suggested) insertion was at least
>>    200% slower. So the execute batch works better for the inserts into
>>    the table for the payloads required by us. The temp table query took
>>    10 MINS (this time I waited patiently for it to finish - there was
>>    nothing wrong there - the query just takes too much time and I
>>    didn't have patience to wait for it to finish the first time :) ) to
>>    execute while the regular table with primary key took just 4 seconds.
>>
>>    f) The stack overflow exception for "select ... values...union" was
>>    much less of a problem in 10.4 compared to 10.1 since the memory
>>    usage is tapering off to the -mx<Size In Megs> for the JVM. But,
>>    since it cannot be parameterized it isn't of much use to us. IN
>>    clause based queries are getting parsed with much lower memory
>>    footprints.
>>
>>    Best regards,
>>
>>    Arindam.
>>
>>    On Wed, Apr 8, 2009 at 2:39 AM, <derby@segel.com
>>    <ma...@segel.com>> wrote:
>>
>>
>>    >  -----Original Message-----
>>    >  From: Mike Matrigali [mailto:mikem_app@sbcglobal.net
>>    <ma...@sbcglobal.net>]
>>    >  Sent: Tuesday, April 07, 2009 2:05 PM
>>    >  To: Derby Discussion
>>
>>    >  Subject: Re: URGENT!!! JDBC SQL query taking long time for large
>>    IN clause
>>    >
>>
>>    >  It is impossible to say what the performance of the query can be
>>    without
>>    >  knowing exact values of all the values of the IN LIST.  But it is
>>    >  possible to get some idea assuming some worst case behavior, and
>> from
>>    >  that I am going to guess you will never come close to 100ms with an
>>    >  uncached database, on hardware using some sort of standard disk
>> based
>>    >  hard drive.
>>    >
>>    >  I do think the query may go faster with index and query tweeking,
>> but
>>    >  100ms to an uncached db and non-clustered unique values in that IN
>>    list
>>    >  is never going to go that fast.  Adding up just what is posted it
>>    looks
>>    >  like this is a 1.2 gig db.
>>    >
>>
>>    Drop the unnecessary indexes and you'll see the database size shrink
>>    fast.
>>    Also note that he's running this on a Windows XP laptop. Depending
>>    on the
>>    model of the lap top, you will have not only CPU issues but also
>>    disk i/o
>>    issues as well. (5400 rpm IDE as an example....)
>>
>>    However, it is possible for the OP to get better performance, if not
>>    realistically 100ms performance. (BTW where did 100ms come from? I'm
>>    sorry
>>    but this really sounds like a class project...)
>>
>>
>>    >  You posted the space for the tables and indexes.  The interesting
>> ones
>>    >  are the big ones.  You have 5 tables or indexes over 1000 pages
>>    big.  If
>>    >  in the worst case your 1000 value IN list happens to be on 1000
>>    >  different pages then Derby is going to need to do at least 1000
>>    i/o's to
>>    >  get to them - I usually use back of envelope of max 100 i/o's per
>>    second
>>    >  (even if your disk has specs that say higher rate this I/O is not
>>    >  going to
>>    >  get streamed as fast as possible by this query, it is going to ask
>> for
>>    >  page, process it, do some join work then later ask for another
>>    page, ...)
>>    >  :
>>    >  > CATEGORY_MASTER            0    103    0    0    4096    0
>>    >  > SQL090406091302600        1    55    0    0    4096    0
>>    >  > SQL090406091302601        1    160    0    1    4096    0
>>    >  > SQL090406091302730        1    1    0    1    4096    0
>>    >  > OBJECT_MASTER            0    10497    0    0    4096    0
>>    >  > SQL090406091302760        1    5340    0    1    4096    0
>>    >  > SQL090406091302761        1    16708    0    410    4096    0
>>    >  > OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
>>    >  > OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096
>>  0
>>    >
>>
>>    Mike,
>>
>>    I think that a lot of this information is a bit skewed. Outside of the
>>    primary index, the indexes he created included the varchar field.
>>    Not sure
>>    why he did this except under the impression that he'd only have to
>>    hit the
>>    index and not the underlying table. While there is some potential
>>    merit to
>>    this, I think that there are things that he can do to improve
>>    performance.
>>    (Hence my post about reworking the query itself and using a temp
>> table.)
>>    Drop those indexes and you'll see a big change in database size.
>>
>>
>>    >  There was work done in 10.3 on IN-LISTS, making them perform more
>> like
>>    >  unions,  See DERBY-47.  So if you have a choice of releases I would
>>    >  suggest you move to 10.4 and post query plan and results against
>> that.
>>    >  The basic idea of that change was to allow the
>>    >  system to do 1 probe into an index for each value in the IN-LIST,
>>    before
>>    >  this change DERBY could only sort the values in the IN list and then
>>    >  limit a index scan to the lowest and biggest values in the in list.
>>    >  So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it
>> might
>>    >  have to scan 112177 pages to find the 1000 rows, where worst case
>> for
>>    >  probing would be 1000 page (plus btree parent index pages, but those
>>    >  are much more likely cached).  The problem is that there is
>> definitely
>>    >  overhead for probing one at a time, scans go much faster - so there
>> is
>>    >  a crossover point - ie. I would guess it would likely better to
>>    scan all
>>    >  112177 pages then do 100,000 probes.
>>    >
>>
>>    I believe that it was already recommended that he do just that.
>>    There are two ways he could use the temp table. As a sub-select
>>    statement,
>>    or as part of the table join.
>>
>>    I think this would bypass the whole use of the IN list. I'm still
>>    not 100%
>>    sure why there's 100+ values coming from an outside source. Based on
>> his
>>    query below it looks like the object_ids in the IN clause are not
>>    unique...
>>
>>    Its kind of hard trying to help someone when you don't know the whole
>>    problem....
>>
>>    -Mike
>>
>>
>>    >  arindam.bhattacharjee wrote:
>>    >  > Hello Knut,
>>    >  >
>>    >  > Thanks for your quick response. This is a sample database which
>>    I have
>>    >  > created just for testing out the performance and has been written
>> to
>>    >  only
>>    >  > once in one go. I tried temp tables but that is just too slow.
>>    The IN
>>    >  clause
>>    >  > has values which comes from another source and I can't modify
>> that.
>>    >  >
>>    >  > However, I will try out what you state below. But still, I
>>    wanted to get
>>    >  > your pulse about whether Derby can respond in sub 100 millisec
>>    time with
>>    >  the
>>    >  > table sizes you see above?
>>    >  >
>>    >  > I find that:
>>    >  >
>>    >  > select category_master.category_name,
>>    >  count(category_master.category_name)
>>    >  > as category_count
>>    >  > from
>>    >  >     (
>>    >  >             select internal.object_id
>>    >  >             from
>>    >  >             (
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1002) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all
>>    >  >                     values(1001) union all .......
>>    >  >                     values(9999)
>>    >  >             ) as internal(object_id)
>>    >  >
>>    >  >     ) as external_ids,
>>    >  >     object_master,
>>    >  >     category_master,
>>    >  >     object_category_mapping
>>    >  > where
>>    >  >     external_ids.object_id = object_master.object_id and
>>    >  >     external_ids.object_id = object_category_mapping.object_id and
>>    >  >     object_master.object_id = object_category_mapping.object_id
>> and
>>    >  >     category_master.category_id =
>>    object_category_mapping.category_id
>>    >  > group by
>>    >  >     category_master.category_name
>>    >  > order by
>>    >  >     category_count desc
>>    >  >
>>    >  > is much faster unfortunately connection.prepareStatement() is
>>    taking way
>>    >  too
>>    >  > much memory (both stack and heap - I have a constraint of 256 MB
>> MAX
>>    >  memory
>>    >  > for my JVM) which goes beyond my applications resources. Is
>>    there a way
>>    >  I
>>    >  > can precompile some SQLs which are very expensive to parse during
>>    >  execution.
>>    >  >
>>    >  > Best regards,
>>    >  >
>>    >  > Arindam.
>>    >  >
>>    >  >
>>    >  > Knut Anders Hatlen wrote:
>>    >  >> "arindam.bhattacharjee" <mr.arindam.bhattacharjee@gmail.com
>>    <ma...@gmail.com>> writes:
>>    >  >>
>>    >  >>> Hello,
>>    >  >>>
>>    >  >>> I would like my query below to return within 100 millisecs.
>> Please
>>    >  help
>>    >  >>> me,
>>    >  >>> and the values for the IN clause comes from outside hence cannot
>>    >  really
>>    >  >>> change the IN clause to a join on an existing table.
>>    >  >> Hi Arindam,
>>    >  >>
>>    >  >> Does the query run faster if you compress all the tables
>>    involved, or
>>    >  if
>>    >  >> you drop and recreate all the indexes? If so, it is likely that
>> the
>>    >  >> index cardinality statistics are out of date, which may make the
>>    >  >> optimizer pick a bad execution plan. Currently, index cardinality
>>    >  >> statistics are only updated at index creation time, when tables
>> are
>>    >  >> compressed, and when columns are dropped. A more automatic
>>    solution is
>>    >  >> being worked on. For more details, see:
>>    >  >>
>>    >  >> https://issues.apache.org/jira/browse/DERBY-269
>>    >  >> https://issues.apache.org/jira/browse/DERBY-3788
>>    >  >> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
>>    >  >>
>>    >  >> You may be experiencing some other problem, but this is a
>>    problem that
>>    >  >> keeps coming up, so I think it's worth checking.
>>    >  >>
>>    >  >> Hope this helps,
>>    >  >>
>>    >  >> --
>>    >  >> Knut Anders
>>    >  >>
>>    >  >>
>>    >  >
>>
>>
>>
>>
>>
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Arindam Bhattacharjee wrote:
> Thanks for the excellent summary Mike. The IN clause will have an unique 
> list of object ids. If we can return 100 object ids in good time, we are 
> okay. The users will be given an option to increase this to MAX 1000 - 
> however, they will be warned that the performance would be slower. The 
> real power of Derby according to us is the flexibility to manage the 
> life cycle of the database files without support from DB admins. We want 
> to make the DB management completely hidden from our customers. However, 
> SolidDB and TimesTen are good options which we can investigate but Derby 
> has the advantage of being used within our organization already. In 
> general we liked its memory footprint - except for one case of SQL 
> parsing, it was mostly well behaved.
> 
> Question: will changing page size for the tables improve query speed? We 
> will always run one query through multiple threads on Derby and we will 
> not use it for any other purpose. We are okay to be a little slow on the 
> insertions but retrieval should be as quick and fast as possible. The 
> database will be created and replicated. The replicated databases will 
> be read-only.
I am not sure about page size but seems reasonable to try upping them 
all to 32k.  I usually leave page size to the very end of tuning.  Best 
is just to run the experiment.  You definitely have enough rows so
you won't be wasting any space going to bigger pages.  The trade off is
that i/o will be better but processing bigger pages requires more cpu
for things like binary searches on the btree pages.  But bigger pages 
may also shrink the height of the btree so again it is a tradeoff.
Another downside is that memory footprint of derby will increase as 
Derby currently sizes it's cache by number of pages so where you may
have 1000 (default cache size is 1000 pages) 4k pages cached you now 
might get 1000 32k pages.

It sounded like your standard app tends to load a bunch of data and then
deliver those dbs for read only access.  Again this is a minor tweak but
you may see some improvement if you can sort the data by primary key for
insert into the base table.  This will give you a one time clustering 
that would improve index scans that require index to base table look up.
Thinking about this, I wonder if we should add an option to compress 
table to do this automatically?  It would be very easy to do (ie. a one 
time cluster - continual cluster would be a lot more work). All that 
would have to happen is at compress time throw all the rows into the 
sorter and then insert the result on the other side.

> 
> Best regards,
> 
> Arindam.
> 
> On Wed, Apr 8, 2009 at 6:23 PM, <derby@segel.com 
> <ma...@segel.com>> wrote:
> 
>      
> 
>     Arindam,
> 
>      
> 
>     Ok,
> 
>      
> 
>     Just a few things…
> 
>      
> 
>     1)       Derby is a free, open source, product. Warts and all, its
>     not going to be your best choice for a commercial rdbms product.
>     (Sorry Derby fans, you get what you pay for.)
> 
>     2)       You want speed, you’re going to have to consider IBM’s
>     SolidDB which is an in memory database along with Oracle’s TimesTen
>     database. This will give you the speed that you want.
> 
>     3)       You’re running on a laptop where your disk drive could be a
>     5400 rpm IDE drive. Laptop drive == low energy consumption and low
>     performance. Disk based solutions will be much slower on a laptop
>     than on a ‘comparable’ desktop and/or server.
> 
>     4)       I’m not sure why you had to load and then compress your
>     tables. I could understand that if you created your index prior to
>     loading the data that you could have to update the statistics.
> 
>     5)       Since we don’t know what you’re actually trying to do,
>     there could be a better or more efficient design. Having an IN
>     clause with 1000’s of entries is definitely not a good design for
>     any database.
> 
>     6)       You never did answer the question about the object ids that
>     were in the IN clause. Were they unique or were there duplicates?
> 
>      
> 
>      
> 
>     HTH
> 
>      
> 
>     -Mike
> 
>     ------------------------------------------------------------------------
> 
>     *From:* Arindam Bhattacharjee
>     [mailto:mr.arindam.bhattacharjee@gmail.com
>     <ma...@gmail.com>]
>     *Sent:* Wednesday, April 08, 2009 2:14 AM
> 
>     *To:* Derby Discussion; msegel@segel.com <ma...@segel.com>
>     *Subject:* Re: URGENT!!! JDBC SQL query taking long time for large
>     IN clause
> 
>      
> 
>     Few inputs/observations:
> 
>     a) This is not a school project. This is a project which makes money
>     for us today. We have functioning product running today, developed
>     by us which doesn't use any RDBMS's to achieve what I am trying to
>     achieve with the tables. For 200,000 object master we get 45 - 60
>     millis performance. The reason we are trying to use an RDBMS is to
>     ensure that we can scale even higher and updates are easier and more
>     maintenable. Hence all these questions.
> 
>     b) I tried removing the "preserve rows" from the declare temp table
>     clause - and that didn't help me either.
> 
>     c) The data base which I have created is just a sample database
>     which vaguely represents the load which we will see if we implement
>     the existing feature of ours, using Derby RDBMS. It doesn't contain
>     REAL data.
> 
>     d) The entire database was created in one shot, as I have stated
>     earlier, and records are populated in one go. That was the actual
>     problem - thanks to Knuth for indicating that as a possibility. When
>     I compress the tables - the quries start performing a LOT faster! 1
>     million in object master, 20 million in object category mapping and
>     10 K in category master and 1000 in IN clause gives me 1100 millis
>     speed now - down from 30 seconds. That probably explains why the
>     object master index scans were taking nearly 30 seconds in the
>     runtimestats which I posted. Which is slow, but still manageable
>     since the 100 and 500s are much faster. However, that is still a lot
>     slower than other databases we are testing on. Since Derby is
>     embedded we expected it to be much faster since no IPC is required
>     between the client driver and the network server.
> 
>     e) IN doesn't function properly beyond 1000 object ids, and we used
>     a regular table CREATE TABLE DUMMY (ID INTEGER PRIMARY KEY).
>     Insertion of 5000 records into this happened in 100 millis (after
>     compression of the tables) and the query took 4 seconds. With
>     executeUpdate() or execute() (as suggested) insertion was at least
>     200% slower. So the execute batch works better for the inserts into
>     the table for the payloads required by us. The temp table query took
>     10 MINS (this time I waited patiently for it to finish - there was
>     nothing wrong there - the query just takes too much time and I
>     didn't have patience to wait for it to finish the first time :) ) to
>     execute while the regular table with primary key took just 4 seconds.
> 
>     f) The stack overflow exception for "select ... values...union" was
>     much less of a problem in 10.4 compared to 10.1 since the memory
>     usage is tapering off to the -mx<Size In Megs> for the JVM. But,
>     since it cannot be parameterized it isn't of much use to us. IN
>     clause based queries are getting parsed with much lower memory
>     footprints.
> 
>     Best regards,
> 
>     Arindam.
> 
>     On Wed, Apr 8, 2009 at 2:39 AM, <derby@segel.com
>     <ma...@segel.com>> wrote:
> 
> 
>     >  -----Original Message-----
>     >  From: Mike Matrigali [mailto:mikem_app@sbcglobal.net
>     <ma...@sbcglobal.net>]
>     >  Sent: Tuesday, April 07, 2009 2:05 PM
>     >  To: Derby Discussion
> 
>     >  Subject: Re: URGENT!!! JDBC SQL query taking long time for large
>     IN clause
>     >
> 
>     >  It is impossible to say what the performance of the query can be
>     without
>     >  knowing exact values of all the values of the IN LIST.  But it is
>     >  possible to get some idea assuming some worst case behavior, and from
>     >  that I am going to guess you will never come close to 100ms with an
>     >  uncached database, on hardware using some sort of standard disk based
>     >  hard drive.
>     >
>     >  I do think the query may go faster with index and query tweeking, but
>     >  100ms to an uncached db and non-clustered unique values in that IN
>     list
>     >  is never going to go that fast.  Adding up just what is posted it
>     looks
>     >  like this is a 1.2 gig db.
>     >
> 
>     Drop the unnecessary indexes and you'll see the database size shrink
>     fast.
>     Also note that he's running this on a Windows XP laptop. Depending
>     on the
>     model of the lap top, you will have not only CPU issues but also
>     disk i/o
>     issues as well. (5400 rpm IDE as an example....)
> 
>     However, it is possible for the OP to get better performance, if not
>     realistically 100ms performance. (BTW where did 100ms come from? I'm
>     sorry
>     but this really sounds like a class project...)
> 
> 
>     >  You posted the space for the tables and indexes.  The interesting ones
>     >  are the big ones.  You have 5 tables or indexes over 1000 pages
>     big.  If
>     >  in the worst case your 1000 value IN list happens to be on 1000
>     >  different pages then Derby is going to need to do at least 1000
>     i/o's to
>     >  get to them - I usually use back of envelope of max 100 i/o's per
>     second
>     >  (even if your disk has specs that say higher rate this I/O is not
>     >  going to
>     >  get streamed as fast as possible by this query, it is going to ask for
>     >  page, process it, do some join work then later ask for another
>     page, ...)
>     >  :
>     >  > CATEGORY_MASTER            0    103    0    0    4096    0
>     >  > SQL090406091302600        1    55    0    0    4096    0
>     >  > SQL090406091302601        1    160    0    1    4096    0
>     >  > SQL090406091302730        1    1    0    1    4096    0
>     >  > OBJECT_MASTER            0    10497    0    0    4096    0
>     >  > SQL090406091302760        1    5340    0    1    4096    0
>     >  > SQL090406091302761        1    16708    0    410    4096    0
>     >  > OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
>     >  > OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0
>     >
> 
>     Mike,
> 
>     I think that a lot of this information is a bit skewed. Outside of the
>     primary index, the indexes he created included the varchar field.
>     Not sure
>     why he did this except under the impression that he'd only have to
>     hit the
>     index and not the underlying table. While there is some potential
>     merit to
>     this, I think that there are things that he can do to improve
>     performance.
>     (Hence my post about reworking the query itself and using a temp table.)
>     Drop those indexes and you'll see a big change in database size.
> 
> 
>     >  There was work done in 10.3 on IN-LISTS, making them perform more like
>     >  unions,  See DERBY-47.  So if you have a choice of releases I would
>     >  suggest you move to 10.4 and post query plan and results against that.
>     >  The basic idea of that change was to allow the
>     >  system to do 1 probe into an index for each value in the IN-LIST,
>     before
>     >  this change DERBY could only sort the values in the IN list and then
>     >  limit a index scan to the lowest and biggest values in the in list.
>     >  So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it might
>     >  have to scan 112177 pages to find the 1000 rows, where worst case for
>     >  probing would be 1000 page (plus btree parent index pages, but those
>     >  are much more likely cached).  The problem is that there is definitely
>     >  overhead for probing one at a time, scans go much faster - so there is
>     >  a crossover point - ie. I would guess it would likely better to
>     scan all
>     >  112177 pages then do 100,000 probes.
>     >
> 
>     I believe that it was already recommended that he do just that.
>     There are two ways he could use the temp table. As a sub-select
>     statement,
>     or as part of the table join.
> 
>     I think this would bypass the whole use of the IN list. I'm still
>     not 100%
>     sure why there's 100+ values coming from an outside source. Based on his
>     query below it looks like the object_ids in the IN clause are not
>     unique...
> 
>     Its kind of hard trying to help someone when you don't know the whole
>     problem....
> 
>     -Mike
> 
> 
>     >  arindam.bhattacharjee wrote:
>     >  > Hello Knut,
>     >  >
>     >  > Thanks for your quick response. This is a sample database which
>     I have
>     >  > created just for testing out the performance and has been written to
>     >  only
>     >  > once in one go. I tried temp tables but that is just too slow.
>     The IN
>     >  clause
>     >  > has values which comes from another source and I can't modify that.
>     >  >
>     >  > However, I will try out what you state below. But still, I
>     wanted to get
>     >  > your pulse about whether Derby can respond in sub 100 millisec
>     time with
>     >  the
>     >  > table sizes you see above?
>     >  >
>     >  > I find that:
>     >  >
>     >  > select category_master.category_name,
>     >  count(category_master.category_name)
>     >  > as category_count
>     >  > from
>     >  >     (
>     >  >             select internal.object_id
>     >  >             from
>     >  >             (
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1002) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all
>     >  >                     values(1001) union all .......
>     >  >                     values(9999)
>     >  >             ) as internal(object_id)
>     >  >
>     >  >     ) as external_ids,
>     >  >     object_master,
>     >  >     category_master,
>     >  >     object_category_mapping
>     >  > where
>     >  >     external_ids.object_id = object_master.object_id and
>     >  >     external_ids.object_id = object_category_mapping.object_id and
>     >  >     object_master.object_id = object_category_mapping.object_id and
>     >  >     category_master.category_id =
>     object_category_mapping.category_id
>     >  > group by
>     >  >     category_master.category_name
>     >  > order by
>     >  >     category_count desc
>     >  >
>     >  > is much faster unfortunately connection.prepareStatement() is
>     taking way
>     >  too
>     >  > much memory (both stack and heap - I have a constraint of 256 MB MAX
>     >  memory
>     >  > for my JVM) which goes beyond my applications resources. Is
>     there a way
>     >  I
>     >  > can precompile some SQLs which are very expensive to parse during
>     >  execution.
>     >  >
>     >  > Best regards,
>     >  >
>     >  > Arindam.
>     >  >
>     >  >
>     >  > Knut Anders Hatlen wrote:
>     >  >> "arindam.bhattacharjee" <mr.arindam.bhattacharjee@gmail.com
>     <ma...@gmail.com>> writes:
>     >  >>
>     >  >>> Hello,
>     >  >>>
>     >  >>> I would like my query below to return within 100 millisecs. Please
>     >  help
>     >  >>> me,
>     >  >>> and the values for the IN clause comes from outside hence cannot
>     >  really
>     >  >>> change the IN clause to a join on an existing table.
>     >  >> Hi Arindam,
>     >  >>
>     >  >> Does the query run faster if you compress all the tables
>     involved, or
>     >  if
>     >  >> you drop and recreate all the indexes? If so, it is likely that the
>     >  >> index cardinality statistics are out of date, which may make the
>     >  >> optimizer pick a bad execution plan. Currently, index cardinality
>     >  >> statistics are only updated at index creation time, when tables are
>     >  >> compressed, and when columns are dropped. A more automatic
>     solution is
>     >  >> being worked on. For more details, see:
>     >  >>
>     >  >> https://issues.apache.org/jira/browse/DERBY-269
>     >  >> https://issues.apache.org/jira/browse/DERBY-3788
>     >  >> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
>     >  >>
>     >  >> You may be experiencing some other problem, but this is a
>     problem that
>     >  >> keeps coming up, so I think it's worth checking.
>     >  >>
>     >  >> Hope this helps,
>     >  >>
>     >  >> --
>     >  >> Knut Anders
>     >  >>
>     >  >>
>     >  >
> 
> 
>      
> 
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

Thanks for the excellent summary Mike. The IN clause will have an unique
list of object ids. If we can return 100 object ids in good time, we are
okay. The users will be given an option to increase this to MAX 1000 -
however, they will be warned that the performance would be slower. The real
power of Derby according to us is the flexibility to manage the life cycle
of the database files without support from DB admins. We want to make the DB
management completely hidden from our customers. However, SolidDB and
TimesTen are good options which we can investigate but Derby has the
advantage of being used within our organization already. In general we liked
its memory footprint - except for one case of SQL parsing, it was mostly
well behaved.

Question: will changing page size for the tables improve query speed? We
will always run one query through multiple threads on Derby and we will not
use it for any other purpose. We are okay to be a little slow on the
insertions but retrieval should be as quick and fast as possible. The
database will be created and replicated. The replicated databases will be
read-only.

Best regards,

Arindam.

On Wed, Apr 8, 2009 at 6:23 PM, <de...@segel.com> wrote:

>
>
> Arindam,
>
>
>
> Ok,
>
>
>
> Just a few things…
>
>
>
> 1)       Derby is a free, open source, product. Warts and all, its not
> going to be your best choice for a commercial rdbms product. (Sorry Derby
> fans, you get what you pay for.)
>
> 2)       You want speed, you’re going to have to consider IBM’s SolidDB
> which is an in memory database along with Oracle’s TimesTen database. This
> will give you the speed that you want.
>
> 3)       You’re running on a laptop where your disk drive could be a 5400
> rpm IDE drive. Laptop drive == low energy consumption and low performance.
> Disk based solutions will be much slower on a laptop than on a ‘comparable’
> desktop and/or server.
>
> 4)       I’m not sure why you had to load and then compress your tables. I
> could understand that if you created your index prior to loading the data
> that you could have to update the statistics.
>
> 5)       Since we don’t know what you’re actually trying to do, there
> could be a better or more efficient design. Having an IN clause with 1000’s
> of entries is definitely not a good design for any database.
>
> 6)       You never did answer the question about the object ids that were
> in the IN clause. Were they unique or were there duplicates?
>
>
>
>
>
> HTH
>
>
>
> -Mike
>   ------------------------------
>
> *From:* Arindam Bhattacharjee [mailto:mr.arindam.bhattacharjee@gmail.com]
> *Sent:* Wednesday, April 08, 2009 2:14 AM
> *To:* Derby Discussion; msegel@segel.com
> *Subject:* Re: URGENT!!! JDBC SQL query taking long time for large IN
> clause
>
>
>
> Few inputs/observations:
>
> a) This is not a school project. This is a project which makes money for us
> today. We have functioning product running today, developed by us which
> doesn't use any RDBMS's to achieve what I am trying to achieve with the
> tables. For 200,000 object master we get 45 - 60 millis performance. The
> reason we are trying to use an RDBMS is to ensure that we can scale even
> higher and updates are easier and more maintenable. Hence all these
> questions.
>
> b) I tried removing the "preserve rows" from the declare temp table clause
> - and that didn't help me either.
>
> c) The data base which I have created is just a sample database which
> vaguely represents the load which we will see if we implement the existing
> feature of ours, using Derby RDBMS. It doesn't contain REAL data.
>
> d) The entire database was created in one shot, as I have stated earlier,
> and records are populated in one go. That was the actual problem - thanks to
> Knuth for indicating that as a possibility. When I compress the tables - the
> quries start performing a LOT faster! 1 million in object master, 20 million
> in object category mapping and 10 K in category master and 1000 in IN clause
> gives me 1100 millis speed now - down from 30 seconds. That probably
> explains why the object master index scans were taking nearly 30 seconds in
> the runtimestats which I posted. Which is slow, but still manageable since
> the 100 and 500s are much faster. However, that is still a lot slower than
> other databases we are testing on. Since Derby is embedded we expected it to
> be much faster since no IPC is required between the client driver and the
> network server.
>
> e) IN doesn't function properly beyond 1000 object ids, and we used a
> regular table CREATE TABLE DUMMY (ID INTEGER PRIMARY KEY). Insertion of 5000
> records into this happened in 100 millis (after compression of the tables)
> and the query took 4 seconds. With executeUpdate() or execute() (as
> suggested) insertion was at least 200% slower. So the execute batch works
> better for the inserts into the table for the payloads required by us. The
> temp table query took 10 MINS (this time I waited patiently for it to finish
> - there was nothing wrong there - the query just takes too much time and I
> didn't have patience to wait for it to finish the first time :) ) to execute
> while the regular table with primary key took just 4 seconds.
>
> f) The stack overflow exception for "select ... values...union" was much
> less of a problem in 10.4 compared to 10.1 since the memory usage is
> tapering off to the -mx<Size In Megs> for the JVM. But, since it cannot be
> parameterized it isn't of much use to us. IN clause based queries are
> getting parsed with much lower memory footprints.
>
> Best regards,
>
> Arindam.
>
> On Wed, Apr 8, 2009 at 2:39 AM, <de...@segel.com> wrote:
>
>
> > -----Original Message-----
> > From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]
> > Sent: Tuesday, April 07, 2009 2:05 PM
> > To: Derby Discussion
>
> > Subject: Re: URGENT!!! JDBC SQL query taking long time for large IN
> clause
> >
>
> > It is impossible to say what the performance of the query can be without
> > knowing exact values of all the values of the IN LIST.  But it is
> > possible to get some idea assuming some worst case behavior, and from
> > that I am going to guess you will never come close to 100ms with an
> > uncached database, on hardware using some sort of standard disk based
> > hard drive.
> >
> > I do think the query may go faster with index and query tweeking, but
> > 100ms to an uncached db and non-clustered unique values in that IN list
> > is never going to go that fast.  Adding up just what is posted it looks
> > like this is a 1.2 gig db.
> >
>
> Drop the unnecessary indexes and you'll see the database size shrink fast.
> Also note that he's running this on a Windows XP laptop. Depending on the
> model of the lap top, you will have not only CPU issues but also disk i/o
> issues as well. (5400 rpm IDE as an example....)
>
> However, it is possible for the OP to get better performance, if not
> realistically 100ms performance. (BTW where did 100ms come from? I'm sorry
> but this really sounds like a class project...)
>
>
> > You posted the space for the tables and indexes.  The interesting ones
> > are the big ones.  You have 5 tables or indexes over 1000 pages big.  If
> > in the worst case your 1000 value IN list happens to be on 1000
> > different pages then Derby is going to need to do at least 1000 i/o's to
> > get to them - I usually use back of envelope of max 100 i/o's per second
> > (even if your disk has specs that say higher rate this I/O is not
> > going to
> > get streamed as fast as possible by this query, it is going to ask for
> > page, process it, do some join work then later ask for another page, ...)
> > :
> > > CATEGORY_MASTER            0    103    0    0    4096    0
> > > SQL090406091302600        1    55    0    0    4096    0
> > > SQL090406091302601        1    160    0    1    4096    0
> > > SQL090406091302730        1    1    0    1    4096    0
> > > OBJECT_MASTER            0    10497    0    0    4096    0
> > > SQL090406091302760        1    5340    0    1    4096    0
> > > SQL090406091302761        1    16708    0    410    4096    0
> > > OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
> > > OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0
> >
>
> Mike,
>
> I think that a lot of this information is a bit skewed. Outside of the
> primary index, the indexes he created included the varchar field. Not sure
> why he did this except under the impression that he'd only have to hit the
> index and not the underlying table. While there is some potential merit to
> this, I think that there are things that he can do to improve performance.
> (Hence my post about reworking the query itself and using a temp table.)
> Drop those indexes and you'll see a big change in database size.
>
>
> > There was work done in 10.3 on IN-LISTS, making them perform more like
> > unions,  See DERBY-47.  So if you have a choice of releases I would
> > suggest you move to 10.4 and post query plan and results against that.
> > The basic idea of that change was to allow the
> > system to do 1 probe into an index for each value in the IN-LIST, before
> > this change DERBY could only sort the values in the IN list and then
> > limit a index scan to the lowest and biggest values in the in list.
> > So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it might
> > have to scan 112177 pages to find the 1000 rows, where worst case for
> > probing would be 1000 page (plus btree parent index pages, but those
> > are much more likely cached).  The problem is that there is definitely
> > overhead for probing one at a time, scans go much faster - so there is
> > a crossover point - ie. I would guess it would likely better to scan all
> > 112177 pages then do 100,000 probes.
> >
>
> I believe that it was already recommended that he do just that.
> There are two ways he could use the temp table. As a sub-select statement,
> or as part of the table join.
>
> I think this would bypass the whole use of the IN list. I'm still not 100%
> sure why there's 100+ values coming from an outside source. Based on his
> query below it looks like the object_ids in the IN clause are not unique...
>
> Its kind of hard trying to help someone when you don't know the whole
> problem....
>
> -Mike
>
>
> > arindam.bhattacharjee wrote:
> > > Hello Knut,
> > >
> > > Thanks for your quick response. This is a sample database which I have
> > > created just for testing out the performance and has been written to
> > only
> > > once in one go. I tried temp tables but that is just too slow. The IN
> > clause
> > > has values which comes from another source and I can't modify that.
> > >
> > > However, I will try out what you state below. But still, I wanted to
> get
> > > your pulse about whether Derby can respond in sub 100 millisec time
> with
> > the
> > > table sizes you see above?
> > >
> > > I find that:
> > >
> > > select category_master.category_name,
> > count(category_master.category_name)
> > > as category_count
> > > from
> > >     (
> > >             select internal.object_id
> > >             from
> > >             (
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1002) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all .......
> > >                     values(9999)
> > >             ) as internal(object_id)
> > >
> > >     ) as external_ids,
> > >     object_master,
> > >     category_master,
> > >     object_category_mapping
> > > where
> > >     external_ids.object_id = object_master.object_id and
> > >     external_ids.object_id = object_category_mapping.object_id and
> > >     object_master.object_id = object_category_mapping.object_id and
> > >     category_master.category_id = object_category_mapping.category_id
> > > group by
> > >     category_master.category_name
> > > order by
> > >     category_count desc
> > >
> > > is much faster unfortunately connection.prepareStatement() is taking
> way
> > too
> > > much memory (both stack and heap - I have a constraint of 256 MB MAX
> > memory
> > > for my JVM) which goes beyond my applications resources. Is there a way
> > I
> > > can precompile some SQLs which are very expensive to parse during
> > execution.
> > >
> > > Best regards,
> > >
> > > Arindam.
> > >
> > >
> > > Knut Anders Hatlen wrote:
> > >> "arindam.bhattacharjee" <mr...@gmail.com> writes:
> > >>
> > >>> Hello,
> > >>>
> > >>> I would like my query below to return within 100 millisecs. Please
> > help
> > >>> me,
> > >>> and the values for the IN clause comes from outside hence cannot
> > really
> > >>> change the IN clause to a join on an existing table.
> > >> Hi Arindam,
> > >>
> > >> Does the query run faster if you compress all the tables involved, or
> > if
> > >> you drop and recreate all the indexes? If so, it is likely that the
> > >> index cardinality statistics are out of date, which may make the
> > >> optimizer pick a bad execution plan. Currently, index cardinality
> > >> statistics are only updated at index creation time, when tables are
> > >> compressed, and when columns are dropped. A more automatic solution is
> > >> being worked on. For more details, see:
> > >>
> > >> https://issues.apache.org/jira/browse/DERBY-269
> > >> https://issues.apache.org/jira/browse/DERBY-3788
> > >> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
> > >>
> > >> You may be experiencing some other problem, but this is a problem that
> > >> keeps coming up, so I think it's worth checking.
> > >>
> > >> Hope this helps,
> > >>
> > >> --
> > >> Knut Anders
> > >>
> > >>
> > >
>
>
>
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

> Lastly, are you now using Derby 10.4 for all of this work, or are
> you continuing to use 10.1?
[Arindam] I will respond to the rest of your questions ASAP. The answer to
the above question is : 10.4. I am using 10.4 now and it is significantly
better than 10.1 both during insertion and during query.

Best regards,

Arindam.

On Wed, Apr 8, 2009 at 8:06 PM, Bryan Pendleton
<bp...@amberpoint.com>wrote:

> problem - thanks to Knut for indicating that as a possibility. When I
>> compress the tables - the quries start performing a LOT faster!
>>
>
> Good, I'm glad to hear that you've made some good progress.
>
> Do you get a different query plan now? Or is it just that the table
> access is more efficient?
>
> Can you capture the runtime statistics information for the 30 second
> query before you compressed the tables, and for the 1.1 second query
> after you compressed the table, and compare them?
>
> You mention that you've run this query on other databases, and they
> are much faster. Can you make any observations about why that might
> be, and what they may be doing differently? For example, can you see
> if the other database is using a different query plan and let us know
> the general information about what that query plan is?
>
> Also, you mention that the query runs quite well with 100 or 500 elements
> in the IN clause, but falls down with 1000 elements in the IN clause.
> Again, I'd be interested to know how the runtime statistics information
> compares between those two cases. Is it that we using a substantially
> different query plan for the larger query? Or is there some other behavior?
>
> Lastly, are you now using Derby 10.4 for all of this work, or are
> you continuing to use 10.1?
>
> thanks,
>
> bryan
>
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Bryan Pendleton <bp...@amberpoint.com>.

> problem - thanks to Knut for indicating that as a possibility. When I 
> compress the tables - the quries start performing a LOT faster! 

Good, I'm glad to hear that you've made some good progress.

Do you get a different query plan now? Or is it just that the table
access is more efficient?

Can you capture the runtime statistics information for the 30 second
query before you compressed the tables, and for the 1.1 second query
after you compressed the table, and compare them?

You mention that you've run this query on other databases, and they
are much faster. Can you make any observations about why that might
be, and what they may be doing differently? For example, can you see
if the other database is using a different query plan and let us know
the general information about what that query plan is?

Also, you mention that the query runs quite well with 100 or 500 elements
in the IN clause, but falls down with 1000 elements in the IN clause.
Again, I'd be interested to know how the runtime statistics information
compares between those two cases. Is it that we using a substantially
different query plan for the larger query? Or is there some other behavior?

Lastly, are you now using Derby 10.4 for all of this work, or are
you continuing to use 10.1?

thanks,

bryan

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Arindam Bhattacharjee wrote:
> Few inputs/observations:
> 
> a) This is not a school project. This is a project which makes money for 
> us today. We have functioning product running today, developed by us 
> which doesn't use any RDBMS's to achieve what I am trying to achieve 
> with the tables. For 200,000 object master we get 45 - 60 millis 
> performance. The reason we are trying to use an RDBMS is to ensure that 
> we can scale even higher and updates are easier and more maintenable. 
> Hence all these questions.
> 
> b) I tried removing the "preserve rows" from the declare temp table 
> clause - and that didn't help me either.
> 
> c) The data base which I have created is just a sample database which 
> vaguely represents the load which we will see if we implement the 
> existing feature of ours, using Derby RDBMS. It doesn't contain REAL data.
> 
For you and others, one way you could help the derby community help you 
is to post complete test cases if possible.   Issues like this could get
more attention if complete programs are submitted that build the db, 
load fake data and include the exact problem queries.  Put it in the
form of a junit test and it may end up in the nightly suite and then it
is likely no future release of derby will break your query.

I am glad the performance increased, but the compress issue is still
a mystery.  I guess it could be a different query plan - posting a 10.4
query plan would be nice.  If in your
initial load you created indexes before inserts, then on average the
tree is likely to 1/2 filled leafs.  Compress and create index on the
other hand fill the leafs completely (the downside is that means more
splits when inserts come again). Posting another space table result
before and after the compress would be interesting.  Compressing also
has the affect of filling up the operating system I/O cache with the
database if you have enough memory on your machine - Derby reads through 
this cache so it could eliminate a lot of real I/O that Derby would
have to do if it were running on a cold machine, ie. after a reboot.

> d) The entire database was created in one shot, as I have stated 
> earlier, and records are populated in one go. That was the actual 
> problem - thanks to Knuth for indicating that as a possibility. When I 
> compress the tables - the quries start performing a LOT faster! 1 
> million in object master, 20 million in object category mapping and 10 K 
> in category master and 1000 in IN clause gives me 1100 millis speed now 
> - down from 30 seconds. That probably explains why the object master 
> index scans were taking nearly 30 seconds in the runtimestats which I 
> posted. Which is slow, but still manageable since the 100 and 500s are 
> much faster. However, that is still a lot slower than other databases we 
> are testing on. Since Derby is embedded we expected it to be much faster 
> since no IPC is required between the client driver and the network server.
> 
I agree in general that the suggested temp join table better fits the
SQL model for 1000's of IN terms.  Note you may get better performance
if you sort the data before you insert it into the temp table - it may
get better cache clustering when joining to indexes on same key in other
tables.

The temp table performance is a mystery.  Again you could help out the
derby community with filing a jira issue with the exact code you are 
using to get this performance.  There is no reason for inserts into
temp table to be slower than regular table.  Underlying temp tables are
the same as regular tables, just with some of the synchronous write
properties turned off and some special handling of the ddl.

> e) IN doesn't function properly beyond 1000 object ids, and we used a 
> regular table CREATE TABLE DUMMY (ID INTEGER PRIMARY KEY). Insertion of 
> 5000 records into this happened in 100 millis (after compression of the 
> tables) and the query took 4 seconds. With executeUpdate() or execute() 
> (as suggested) insertion was at least 200% slower. So the execute batch 
> works better for the inserts into the table for the payloads required by 
> us. The temp table query took 10 MINS (this time I waited patiently for 
> it to finish - there was nothing wrong there - the query just takes too 
> much time and I didn't have patience to wait for it to finish the first 
> time :) ) to execute while the regular table with primary key took just 
> 4 seconds.
> 
It is good to hear 10.4 is better at this.  I know work was done to 
reduce the compile memory usage for really large queries - i don't 
remember in exactly which release but definitely after 10.1.
Again I would
suggest moving to 10.4 for your application.  It is more likely any 
problems found will be fixed and released in 10.4 than 10.1.

> f) The stack overflow exception for "select ... values...union" was much 
> less of a problem in 10.4 compared to 10.1 since the memory usage is 
> tapering off to the -mx<Size In Megs> for the JVM. But, since it cannot 
> be parameterized it isn't of much use to us. IN clause based queries are 
> getting parsed with much lower memory footprints.
> 
> Best regards,
> 
> Arindam.
> 
> On Wed, Apr 8, 2009 at 2:39 AM, <derby@segel.com 
> <ma...@segel.com>> wrote:
> 
> 
>      > -----Original Message-----
>      > From: Mike Matrigali [mailto:mikem_app@sbcglobal.net
>     <ma...@sbcglobal.net>]
>      > Sent: Tuesday, April 07, 2009 2:05 PM
>      > To: Derby Discussion
>      > Subject: Re: URGENT!!! JDBC SQL query taking long time for large
>     IN clause
>      >
>      > It is impossible to say what the performance of the query can be
>     without
>      > knowing exact values of all the values of the IN LIST.  But it is
>      > possible to get some idea assuming some worst case behavior, and from
>      > that I am going to guess you will never come close to 100ms with an
>      > uncached database, on hardware using some sort of standard disk based
>      > hard drive.
>      >
>      > I do think the query may go faster with index and query tweeking, but
>      > 100ms to an uncached db and non-clustered unique values in that
>     IN list
>      > is never going to go that fast.  Adding up just what is posted it
>     looks
>      > like this is a 1.2 gig db.
>      >
>     Drop the unnecessary indexes and you'll see the database size shrink
>     fast.
>     Also note that he's running this on a Windows XP laptop. Depending
>     on the
>     model of the lap top, you will have not only CPU issues but also
>     disk i/o
>     issues as well. (5400 rpm IDE as an example....)
> 
>     However, it is possible for the OP to get better performance, if not
>     realistically 100ms performance. (BTW where did 100ms come from? I'm
>     sorry
>     but this really sounds like a class project...)
> 
>      > You posted the space for the tables and indexes.  The interesting
>     ones
>      > are the big ones.  You have 5 tables or indexes over 1000 pages
>     big.  If
>      > in the worst case your 1000 value IN list happens to be on 1000
>      > different pages then Derby is going to need to do at least 1000
>     i/o's to
>      > get to them - I usually use back of envelope of max 100 i/o's per
>     second
>      > (even if your disk has specs that say higher rate this I/O is not
>      > going to
>      > get streamed as fast as possible by this query, it is going to
>     ask for
>      > page, process it, do some join work then later ask for another
>     page, ...)
>      > :
>      > > CATEGORY_MASTER            0    103    0    0    4096    0
>      > > SQL090406091302600        1    55    0    0    4096    0
>      > > SQL090406091302601        1    160    0    1    4096    0
>      > > SQL090406091302730        1    1    0    1    4096    0
>      > > OBJECT_MASTER            0    10497    0    0    4096    0
>      > > SQL090406091302760        1    5340    0    1    4096    0
>      > > SQL090406091302761        1    16708    0    410    4096    0
>      > > OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
>      > > OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096
>        0
>      >
> 
>     Mike,
> 
>     I think that a lot of this information is a bit skewed. Outside of the
>     primary index, the indexes he created included the varchar field.
>     Not sure
>     why he did this except under the impression that he'd only have to
>     hit the
>     index and not the underlying table. While there is some potential
>     merit to
>     this, I think that there are things that he can do to improve
>     performance.
>     (Hence my post about reworking the query itself and using a temp table.)
>     Drop those indexes and you'll see a big change in database size.
> 
>      > There was work done in 10.3 on IN-LISTS, making them perform more
>     like
>      > unions,  See DERBY-47.  So if you have a choice of releases I would
>      > suggest you move to 10.4 and post query plan and results against
>     that.
>      > The basic idea of that change was to allow the
>      > system to do 1 probe into an index for each value in the IN-LIST,
>     before
>      > this change DERBY could only sort the values in the IN list and then
>      > limit a index scan to the lowest and biggest values in the in list.
>      > So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it
>     might
>      > have to scan 112177 pages to find the 1000 rows, where worst case for
>      > probing would be 1000 page (plus btree parent index pages, but those
>      > are much more likely cached).  The problem is that there is
>     definitely
>      > overhead for probing one at a time, scans go much faster - so
>     there is
>      > a crossover point - ie. I would guess it would likely better to
>     scan all
>      > 112177 pages then do 100,000 probes.
>      >
>     I believe that it was already recommended that he do just that.
>     There are two ways he could use the temp table. As a sub-select
>     statement,
>     or as part of the table join.
> 
>     I think this would bypass the whole use of the IN list. I'm still
>     not 100%
>     sure why there's 100+ values coming from an outside source. Based on his
>     query below it looks like the object_ids in the IN clause are not
>     unique...
> 
>     Its kind of hard trying to help someone when you don't know the whole
>     problem....
> 
>     -Mike
> 
>      > arindam.bhattacharjee wrote:
>      > > Hello Knut,
>      > >
>      > > Thanks for your quick response. This is a sample database which
>     I have
>      > > created just for testing out the performance and has been
>     written to
>      > only
>      > > once in one go. I tried temp tables but that is just too slow.
>     The IN
>      > clause
>      > > has values which comes from another source and I can't modify that.
>      > >
>      > > However, I will try out what you state below. But still, I
>     wanted to get
>      > > your pulse about whether Derby can respond in sub 100 millisec
>     time with
>      > the
>      > > table sizes you see above?
>      > >
>      > > I find that:
>      > >
>      > > select category_master.category_name,
>      > count(category_master.category_name)
>      > > as category_count
>      > > from
>      > >     (
>      > >             select internal.object_id
>      > >             from
>      > >             (
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1002) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all
>      > >                     values(1001) union all .......
>      > >                     values(9999)
>      > >             ) as internal(object_id)
>      > >
>      > >     ) as external_ids,
>      > >     object_master,
>      > >     category_master,
>      > >     object_category_mapping
>      > > where
>      > >     external_ids.object_id = object_master.object_id and
>      > >     external_ids.object_id = object_category_mapping.object_id and
>      > >     object_master.object_id = object_category_mapping.object_id and
>      > >     category_master.category_id =
>     object_category_mapping.category_id
>      > > group by
>      > >     category_master.category_name
>      > > order by
>      > >     category_count desc
>      > >
>      > > is much faster unfortunately connection.prepareStatement() is
>     taking way
>      > too
>      > > much memory (both stack and heap - I have a constraint of 256
>     MB MAX
>      > memory
>      > > for my JVM) which goes beyond my applications resources. Is
>     there a way
>      > I
>      > > can precompile some SQLs which are very expensive to parse during
>      > execution.
>      > >
>      > > Best regards,
>      > >
>      > > Arindam.
>      > >
>      > >
>      > > Knut Anders Hatlen wrote:
>      > >> "arindam.bhattacharjee" <mr.arindam.bhattacharjee@gmail.com
>     <ma...@gmail.com>> writes:
>      > >>
>      > >>> Hello,
>      > >>>
>      > >>> I would like my query below to return within 100 millisecs.
>     Please
>      > help
>      > >>> me,
>      > >>> and the values for the IN clause comes from outside hence cannot
>      > really
>      > >>> change the IN clause to a join on an existing table.
>      > >> Hi Arindam,
>      > >>
>      > >> Does the query run faster if you compress all the tables
>     involved, or
>      > if
>      > >> you drop and recreate all the indexes? If so, it is likely
>     that the
>      > >> index cardinality statistics are out of date, which may make the
>      > >> optimizer pick a bad execution plan. Currently, index cardinality
>      > >> statistics are only updated at index creation time, when
>     tables are
>      > >> compressed, and when columns are dropped. A more automatic
>     solution is
>      > >> being worked on. For more details, see:
>      > >>
>      > >> https://issues.apache.org/jira/browse/DERBY-269
>      > >> https://issues.apache.org/jira/browse/DERBY-3788
>      > >> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
>      > >>
>      > >> You may be experiencing some other problem, but this is a
>     problem that
>      > >> keeps coming up, so I think it's worth checking.
>      > >>
>      > >> Hope this helps,
>      > >>
>      > >> --
>      > >> Knut Anders
>      > >>
>      > >>
>      > >
> 
> 
> 
>

RE: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by de...@segel.com.

Arindam,

Ok,

Just a few things.

1)       Derby is a free, open source, product. Warts and all, its not going
to be your best choice for a commercial rdbms product. (Sorry Derby fans,
you get what you pay for.)

2)       You want speed, you're going to have to consider IBM's SolidDB
which is an in memory database along with Oracle's TimesTen database. This
will give you the speed that you want.

3)       You're running on a laptop where your disk drive could be a 5400
rpm IDE drive. Laptop drive == low energy consumption and low performance.
Disk based solutions will be much slower on a laptop than on a 'comparable'
desktop and/or server.

4)       I'm not sure why you had to load and then compress your tables. I
could understand that if you created your index prior to loading the data
that you could have to update the statistics.

5)       Since we don't know what you're actually trying to do, there could
be a better or more efficient design. Having an IN clause with 1000's of
entries is definitely not a good design for any database. 

6)       You never did answer the question about the object ids that were in
the IN clause. Were they unique or were there duplicates?

HTH

-Mike

  _____  

From: Arindam Bhattacharjee [mailto:mr.arindam.bhattacharjee@gmail.com] 
Sent: Wednesday, April 08, 2009 2:14 AM
To: Derby Discussion; msegel@segel.com
Subject: Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Few inputs/observations:

a) This is not a school project. This is a project which makes money for us
today. We have functioning product running today, developed by us which
doesn't use any RDBMS's to achieve what I am trying to achieve with the
tables. For 200,000 object master we get 45 - 60 millis performance. The
reason we are trying to use an RDBMS is to ensure that we can scale even
higher and updates are easier and more maintenable. Hence all these
questions.

b) I tried removing the "preserve rows" from the declare temp table clause -
and that didn't help me either.

c) The data base which I have created is just a sample database which
vaguely represents the load which we will see if we implement the existing
feature of ours, using Derby RDBMS. It doesn't contain REAL data.

d) The entire database was created in one shot, as I have stated earlier,
and records are populated in one go. That was the actual problem - thanks to
Knuth for indicating that as a possibility. When I compress the tables - the
quries start performing a LOT faster! 1 million in object master, 20 million
in object category mapping and 10 K in category master and 1000 in IN clause
gives me 1100 millis speed now - down from 30 seconds. That probably
explains why the object master index scans were taking nearly 30 seconds in
the runtimestats which I posted. Which is slow, but still manageable since
the 100 and 500s are much faster. However, that is still a lot slower than
other databases we are testing on. Since Derby is embedded we expected it to
be much faster since no IPC is required between the client driver and the
network server. 

e) IN doesn't function properly beyond 1000 object ids, and we used a
regular table CREATE TABLE DUMMY (ID INTEGER PRIMARY KEY). Insertion of 5000
records into this happened in 100 millis (after compression of the tables)
and the query took 4 seconds. With executeUpdate() or execute() (as
suggested) insertion was at least 200% slower. So the execute batch works
better for the inserts into the table for the payloads required by us. The
temp table query took 10 MINS (this time I waited patiently for it to finish
- there was nothing wrong there - the query just takes too much time and I
didn't have patience to wait for it to finish the first time :) ) to execute
while the regular table with primary key took just 4 seconds. 

f) The stack overflow exception for "select ... values...union" was much
less of a problem in 10.4 compared to 10.1 since the memory usage is
tapering off to the -mx<Size In Megs> for the JVM. But, since it cannot be
parameterized it isn't of much use to us. IN clause based queries are
getting parsed with much lower memory footprints.

Best regards,

Arindam.

On Wed, Apr 8, 2009 at 2:39 AM, <de...@segel.com> wrote:

> -----Original Message-----
> From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]
> Sent: Tuesday, April 07, 2009 2:05 PM
> To: Derby Discussion

> Subject: Re: URGENT!!! JDBC SQL query taking long time for large IN clause
>

> It is impossible to say what the performance of the query can be without
> knowing exact values of all the values of the IN LIST.  But it is
> possible to get some idea assuming some worst case behavior, and from
> that I am going to guess you will never come close to 100ms with an
> uncached database, on hardware using some sort of standard disk based
> hard drive.
>
> I do think the query may go faster with index and query tweeking, but
> 100ms to an uncached db and non-clustered unique values in that IN list
> is never going to go that fast.  Adding up just what is posted it looks
> like this is a 1.2 gig db.
>

Drop the unnecessary indexes and you'll see the database size shrink fast.
Also note that he's running this on a Windows XP laptop. Depending on the
model of the lap top, you will have not only CPU issues but also disk i/o
issues as well. (5400 rpm IDE as an example....)

However, it is possible for the OP to get better performance, if not
realistically 100ms performance. (BTW where did 100ms come from? I'm sorry
but this really sounds like a class project...)

> You posted the space for the tables and indexes.  The interesting ones
> are the big ones.  You have 5 tables or indexes over 1000 pages big.  If
> in the worst case your 1000 value IN list happens to be on 1000
> different pages then Derby is going to need to do at least 1000 i/o's to
> get to them - I usually use back of envelope of max 100 i/o's per second
> (even if your disk has specs that say higher rate this I/O is not
> going to
> get streamed as fast as possible by this query, it is going to ask for
> page, process it, do some join work then later ask for another page, ...)
> :
> > CATEGORY_MASTER            0    103    0    0    4096    0
> > SQL090406091302600        1    55    0    0    4096    0
> > SQL090406091302601        1    160    0    1    4096    0
> > SQL090406091302730        1    1    0    1    4096    0
> > OBJECT_MASTER            0    10497    0    0    4096    0
> > SQL090406091302760        1    5340    0    1    4096    0
> > SQL090406091302761        1    16708    0    410    4096    0
> > OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
> > OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0
>

Mike,

I think that a lot of this information is a bit skewed. Outside of the
primary index, the indexes he created included the varchar field. Not sure
why he did this except under the impression that he'd only have to hit the
index and not the underlying table. While there is some potential merit to
this, I think that there are things that he can do to improve performance.
(Hence my post about reworking the query itself and using a temp table.)
Drop those indexes and you'll see a big change in database size.

> There was work done in 10.3 on IN-LISTS, making them perform more like
> unions,  See DERBY-47.  So if you have a choice of releases I would
> suggest you move to 10.4 and post query plan and results against that.
> The basic idea of that change was to allow the
> system to do 1 probe into an index for each value in the IN-LIST, before
> this change DERBY could only sort the values in the IN list and then
> limit a index scan to the lowest and biggest values in the in list.
> So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it might
> have to scan 112177 pages to find the 1000 rows, where worst case for
> probing would be 1000 page (plus btree parent index pages, but those
> are much more likely cached).  The problem is that there is definitely
> overhead for probing one at a time, scans go much faster - so there is
> a crossover point - ie. I would guess it would likely better to scan all
> 112177 pages then do 100,000 probes.
>

I believe that it was already recommended that he do just that.
There are two ways he could use the temp table. As a sub-select statement,
or as part of the table join.

I think this would bypass the whole use of the IN list. I'm still not 100%
sure why there's 100+ values coming from an outside source. Based on his
query below it looks like the object_ids in the IN clause are not unique...

Its kind of hard trying to help someone when you don't know the whole
problem....

-Mike

> arindam.bhattacharjee wrote:
> > Hello Knut,
> >
> > Thanks for your quick response. This is a sample database which I have
> > created just for testing out the performance and has been written to
> only
> > once in one go. I tried temp tables but that is just too slow. The IN
> clause
> > has values which comes from another source and I can't modify that.
> >
> > However, I will try out what you state below. But still, I wanted to get
> > your pulse about whether Derby can respond in sub 100 millisec time with
> the
> > table sizes you see above?
> >
> > I find that:
> >
> > select category_master.category_name,
> count(category_master.category_name)
> > as category_count
> > from
> >     (
> >             select internal.object_id
> >             from
> >             (
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1002) union all
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1001) union all
> >                     values(1001) union all .......
> >                     values(9999)
> >             ) as internal(object_id)
> >
> >     ) as external_ids,
> >     object_master,
> >     category_master,
> >     object_category_mapping
> > where
> >     external_ids.object_id = object_master.object_id and
> >     external_ids.object_id = object_category_mapping.object_id and
> >     object_master.object_id = object_category_mapping.object_id and
> >     category_master.category_id = object_category_mapping.category_id
> > group by
> >     category_master.category_name
> > order by
> >     category_count desc
> >
> > is much faster unfortunately connection.prepareStatement() is taking way
> too
> > much memory (both stack and heap - I have a constraint of 256 MB MAX
> memory
> > for my JVM) which goes beyond my applications resources. Is there a way
> I
> > can precompile some SQLs which are very expensive to parse during
> execution.
> >
> > Best regards,
> >
> > Arindam.
> >
> >
> > Knut Anders Hatlen wrote:
> >> "arindam.bhattacharjee" <mr...@gmail.com> writes:
> >>
> >>> Hello,
> >>>
> >>> I would like my query below to return within 100 millisecs. Please
> help
> >>> me,
> >>> and the values for the IN clause comes from outside hence cannot
> really
> >>> change the IN clause to a join on an existing table.
> >> Hi Arindam,
> >>
> >> Does the query run faster if you compress all the tables involved, or
> if
> >> you drop and recreate all the indexes? If so, it is likely that the
> >> index cardinality statistics are out of date, which may make the
> >> optimizer pick a bad execution plan. Currently, index cardinality
> >> statistics are only updated at index creation time, when tables are
> >> compressed, and when columns are dropped. A more automatic solution is
> >> being worked on. For more details, see:
> >>
> >> https://issues.apache.org/jira/browse/DERBY-269
> >> https://issues.apache.org/jira/browse/DERBY-3788
> >> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
> >>
> >> You may be experiencing some other problem, but this is a problem that
> >> keeps coming up, so I think it's worth checking.
> >>
> >> Hope this helps,
> >>
> >> --
> >> Knut Anders
> >>
> >>
> >

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

Few inputs/observations:

a) This is not a school project. This is a project which makes money for us
today. We have functioning product running today, developed by us which
doesn't use any RDBMS's to achieve what I am trying to achieve with the
tables. For 200,000 object master we get 45 - 60 millis performance. The
reason we are trying to use an RDBMS is to ensure that we can scale even
higher and updates are easier and more maintenable. Hence all these
questions.

b) I tried removing the "preserve rows" from the declare temp table clause -
and that didn't help me either.

c) The data base which I have created is just a sample database which
vaguely represents the load which we will see if we implement the existing
feature of ours, using Derby RDBMS. It doesn't contain REAL data.

d) The entire database was created in one shot, as I have stated earlier,
and records are populated in one go. That was the actual problem - thanks to
Knuth for indicating that as a possibility. When I compress the tables - the
quries start performing a LOT faster! 1 million in object master, 20 million
in object category mapping and 10 K in category master and 1000 in IN clause
gives me 1100 millis speed now - down from 30 seconds. That probably
explains why the object master index scans were taking nearly 30 seconds in
the runtimestats which I posted. Which is slow, but still manageable since
the 100 and 500s are much faster. However, that is still a lot slower than
other databases we are testing on. Since Derby is embedded we expected it to
be much faster since no IPC is required between the client driver and the
network server.

e) IN doesn't function properly beyond 1000 object ids, and we used a
regular table CREATE TABLE DUMMY (ID INTEGER PRIMARY KEY). Insertion of 5000
records into this happened in 100 millis (after compression of the tables)
and the query took 4 seconds. With executeUpdate() or execute() (as
suggested) insertion was at least 200% slower. So the execute batch works
better for the inserts into the table for the payloads required by us. The
temp table query took 10 MINS (this time I waited patiently for it to finish
- there was nothing wrong there - the query just takes too much time and I
didn't have patience to wait for it to finish the first time :) ) to execute
while the regular table with primary key took just 4 seconds.

f) The stack overflow exception for "select ... values...union" was much
less of a problem in 10.4 compared to 10.1 since the memory usage is
tapering off to the -mx<Size In Megs> for the JVM. But, since it cannot be
parameterized it isn't of much use to us. IN clause based queries are
getting parsed with much lower memory footprints.

Best regards,

Arindam.

On Wed, Apr 8, 2009 at 2:39 AM, <de...@segel.com> wrote:

>
> > -----Original Message-----
> > From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]
> > Sent: Tuesday, April 07, 2009 2:05 PM
> > To: Derby Discussion
> > Subject: Re: URGENT!!! JDBC SQL query taking long time for large IN
> clause
> >
> > It is impossible to say what the performance of the query can be without
> > knowing exact values of all the values of the IN LIST.  But it is
> > possible to get some idea assuming some worst case behavior, and from
> > that I am going to guess you will never come close to 100ms with an
> > uncached database, on hardware using some sort of standard disk based
> > hard drive.
> >
> > I do think the query may go faster with index and query tweeking, but
> > 100ms to an uncached db and non-clustered unique values in that IN list
> > is never going to go that fast.  Adding up just what is posted it looks
> > like this is a 1.2 gig db.
> >
> Drop the unnecessary indexes and you'll see the database size shrink fast.
> Also note that he's running this on a Windows XP laptop. Depending on the
> model of the lap top, you will have not only CPU issues but also disk i/o
> issues as well. (5400 rpm IDE as an example....)
>
> However, it is possible for the OP to get better performance, if not
> realistically 100ms performance. (BTW where did 100ms come from? I'm sorry
> but this really sounds like a class project...)
>
> > You posted the space for the tables and indexes.  The interesting ones
> > are the big ones.  You have 5 tables or indexes over 1000 pages big.  If
> > in the worst case your 1000 value IN list happens to be on 1000
> > different pages then Derby is going to need to do at least 1000 i/o's to
> > get to them - I usually use back of envelope of max 100 i/o's per second
> > (even if your disk has specs that say higher rate this I/O is not
> > going to
> > get streamed as fast as possible by this query, it is going to ask for
> > page, process it, do some join work then later ask for another page, ...)
> > :
> > > CATEGORY_MASTER            0    103    0    0    4096    0
> > > SQL090406091302600        1    55    0    0    4096    0
> > > SQL090406091302601        1    160    0    1    4096    0
> > > SQL090406091302730        1    1    0    1    4096    0
> > > OBJECT_MASTER            0    10497    0    0    4096    0
> > > SQL090406091302760        1    5340    0    1    4096    0
> > > SQL090406091302761        1    16708    0    410    4096    0
> > > OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
> > > OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0
> >
>
> Mike,
>
> I think that a lot of this information is a bit skewed. Outside of the
> primary index, the indexes he created included the varchar field. Not sure
> why he did this except under the impression that he'd only have to hit the
> index and not the underlying table. While there is some potential merit to
> this, I think that there are things that he can do to improve performance.
> (Hence my post about reworking the query itself and using a temp table.)
> Drop those indexes and you'll see a big change in database size.
>
> > There was work done in 10.3 on IN-LISTS, making them perform more like
> > unions,  See DERBY-47.  So if you have a choice of releases I would
> > suggest you move to 10.4 and post query plan and results against that.
> > The basic idea of that change was to allow the
> > system to do 1 probe into an index for each value in the IN-LIST, before
> > this change DERBY could only sort the values in the IN list and then
> > limit a index scan to the lowest and biggest values in the in list.
> > So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it might
> > have to scan 112177 pages to find the 1000 rows, where worst case for
> > probing would be 1000 page (plus btree parent index pages, but those
> > are much more likely cached).  The problem is that there is definitely
> > overhead for probing one at a time, scans go much faster - so there is
> > a crossover point - ie. I would guess it would likely better to scan all
> > 112177 pages then do 100,000 probes.
> >
> I believe that it was already recommended that he do just that.
> There are two ways he could use the temp table. As a sub-select statement,
> or as part of the table join.
>
> I think this would bypass the whole use of the IN list. I'm still not 100%
> sure why there's 100+ values coming from an outside source. Based on his
> query below it looks like the object_ids in the IN clause are not unique...
>
> Its kind of hard trying to help someone when you don't know the whole
> problem....
>
> -Mike
>
> > arindam.bhattacharjee wrote:
> > > Hello Knut,
> > >
> > > Thanks for your quick response. This is a sample database which I have
> > > created just for testing out the performance and has been written to
> > only
> > > once in one go. I tried temp tables but that is just too slow. The IN
> > clause
> > > has values which comes from another source and I can't modify that.
> > >
> > > However, I will try out what you state below. But still, I wanted to
> get
> > > your pulse about whether Derby can respond in sub 100 millisec time
> with
> > the
> > > table sizes you see above?
> > >
> > > I find that:
> > >
> > > select category_master.category_name,
> > count(category_master.category_name)
> > > as category_count
> > > from
> > >     (
> > >             select internal.object_id
> > >             from
> > >             (
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1002) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all
> > >                     values(1001) union all .......
> > >                     values(9999)
> > >             ) as internal(object_id)
> > >
> > >     ) as external_ids,
> > >     object_master,
> > >     category_master,
> > >     object_category_mapping
> > > where
> > >     external_ids.object_id = object_master.object_id and
> > >     external_ids.object_id = object_category_mapping.object_id and
> > >     object_master.object_id = object_category_mapping.object_id and
> > >     category_master.category_id = object_category_mapping.category_id
> > > group by
> > >     category_master.category_name
> > > order by
> > >     category_count desc
> > >
> > > is much faster unfortunately connection.prepareStatement() is taking
> way
> > too
> > > much memory (both stack and heap - I have a constraint of 256 MB MAX
> > memory
> > > for my JVM) which goes beyond my applications resources. Is there a way
> > I
> > > can precompile some SQLs which are very expensive to parse during
> > execution.
> > >
> > > Best regards,
> > >
> > > Arindam.
> > >
> > >
> > > Knut Anders Hatlen wrote:
> > >> "arindam.bhattacharjee" <mr...@gmail.com> writes:
> > >>
> > >>> Hello,
> > >>>
> > >>> I would like my query below to return within 100 millisecs. Please
> > help
> > >>> me,
> > >>> and the values for the IN clause comes from outside hence cannot
> > really
> > >>> change the IN clause to a join on an existing table.
> > >> Hi Arindam,
> > >>
> > >> Does the query run faster if you compress all the tables involved, or
> > if
> > >> you drop and recreate all the indexes? If so, it is likely that the
> > >> index cardinality statistics are out of date, which may make the
> > >> optimizer pick a bad execution plan. Currently, index cardinality
> > >> statistics are only updated at index creation time, when tables are
> > >> compressed, and when columns are dropped. A more automatic solution is
> > >> being worked on. For more details, see:
> > >>
> > >> https://issues.apache.org/jira/browse/DERBY-269
> > >> https://issues.apache.org/jira/browse/DERBY-3788
> > >> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
> > >>
> > >> You may be experiencing some other problem, but this is a problem that
> > >> keeps coming up, so I think it's worth checking.
> > >>
> > >> Hope this helps,
> > >>
> > >> --
> > >> Knut Anders
> > >>
> > >>
> > >
>
>
>
>

RE: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by de...@segel.com.

> -----Original Message-----
> From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]
> Sent: Tuesday, April 07, 2009 2:05 PM
> To: Derby Discussion
> Subject: Re: URGENT!!! JDBC SQL query taking long time for large IN clause
> 
> It is impossible to say what the performance of the query can be without
> knowing exact values of all the values of the IN LIST.  But it is
> possible to get some idea assuming some worst case behavior, and from
> that I am going to guess you will never come close to 100ms with an
> uncached database, on hardware using some sort of standard disk based
> hard drive.
> 
> I do think the query may go faster with index and query tweeking, but
> 100ms to an uncached db and non-clustered unique values in that IN list
> is never going to go that fast.  Adding up just what is posted it looks
> like this is a 1.2 gig db.
> 
Drop the unnecessary indexes and you'll see the database size shrink fast.
Also note that he's running this on a Windows XP laptop. Depending on the
model of the lap top, you will have not only CPU issues but also disk i/o
issues as well. (5400 rpm IDE as an example....)

However, it is possible for the OP to get better performance, if not
realistically 100ms performance. (BTW where did 100ms come from? I'm sorry
but this really sounds like a class project...)

> You posted the space for the tables and indexes.  The interesting ones
> are the big ones.  You have 5 tables or indexes over 1000 pages big.  If
> in the worst case your 1000 value IN list happens to be on 1000
> different pages then Derby is going to need to do at least 1000 i/o's to
> get to them - I usually use back of envelope of max 100 i/o's per second
> (even if your disk has specs that say higher rate this I/O is not
> going to
> get streamed as fast as possible by this query, it is going to ask for
> page, process it, do some join work then later ask for another page, ...)
> :
> > CATEGORY_MASTER            0    103    0    0    4096    0
> > SQL090406091302600        1    55    0    0    4096    0
> > SQL090406091302601        1    160    0    1    4096    0
> > SQL090406091302730        1    1    0    1    4096    0
> > OBJECT_MASTER            0    10497    0    0    4096    0
> > SQL090406091302760        1    5340    0    1    4096    0
> > SQL090406091302761        1    16708    0    410    4096    0
> > OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
> > OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0
> 

Mike,

I think that a lot of this information is a bit skewed. Outside of the
primary index, the indexes he created included the varchar field. Not sure
why he did this except under the impression that he'd only have to hit the
index and not the underlying table. While there is some potential merit to
this, I think that there are things that he can do to improve performance.
(Hence my post about reworking the query itself and using a temp table.)
Drop those indexes and you'll see a big change in database size.

> There was work done in 10.3 on IN-LISTS, making them perform more like
> unions,  See DERBY-47.  So if you have a choice of releases I would
> suggest you move to 10.4 and post query plan and results against that.
> The basic idea of that change was to allow the
> system to do 1 probe into an index for each value in the IN-LIST, before
> this change DERBY could only sort the values in the IN list and then
> limit a index scan to the lowest and biggest values in the in list.
> So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it might
> have to scan 112177 pages to find the 1000 rows, where worst case for
> probing would be 1000 page (plus btree parent index pages, but those
> are much more likely cached).  The problem is that there is definitely
> overhead for probing one at a time, scans go much faster - so there is
> a crossover point - ie. I would guess it would likely better to scan all
> 112177 pages then do 100,000 probes.
> 
I believe that it was already recommended that he do just that.
There are two ways he could use the temp table. As a sub-select statement,
or as part of the table join.

I think this would bypass the whole use of the IN list. I'm still not 100%
sure why there's 100+ values coming from an outside source. Based on his
query below it looks like the object_ids in the IN clause are not unique...

Its kind of hard trying to help someone when you don't know the whole
problem.... 

-Mike

> arindam.bhattacharjee wrote:
> > Hello Knut,
> >
> > Thanks for your quick response. This is a sample database which I have
> > created just for testing out the performance and has been written to
> only
> > once in one go. I tried temp tables but that is just too slow. The IN
> clause
> > has values which comes from another source and I can't modify that.
> >
> > However, I will try out what you state below. But still, I wanted to get
> > your pulse about whether Derby can respond in sub 100 millisec time with
> the
> > table sizes you see above?
> >
> > I find that:
> >
> > select category_master.category_name,
> count(category_master.category_name)
> > as category_count
> > from
> > 	(
> > 		select internal.object_id
> > 		from
> > 		(
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1002) union all
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1001) union all
> > 			values(1001) union all .......
> > 			values(9999)
> > 		) as internal(object_id)
> >
> > 	) as external_ids,
> > 	object_master,
> > 	category_master,
> > 	object_category_mapping
> > where
> > 	external_ids.object_id = object_master.object_id and
> > 	external_ids.object_id = object_category_mapping.object_id and
> > 	object_master.object_id = object_category_mapping.object_id and
> > 	category_master.category_id = object_category_mapping.category_id
> > group by
> > 	category_master.category_name
> > order by
> > 	category_count desc
> >
> > is much faster unfortunately connection.prepareStatement() is taking way
> too
> > much memory (both stack and heap - I have a constraint of 256 MB MAX
> memory
> > for my JVM) which goes beyond my applications resources. Is there a way
> I
> > can precompile some SQLs which are very expensive to parse during
> execution.
> >
> > Best regards,
> >
> > Arindam.
> >
> >
> > Knut Anders Hatlen wrote:
> >> "arindam.bhattacharjee" <mr...@gmail.com> writes:
> >>
> >>> Hello,
> >>>
> >>> I would like my query below to return within 100 millisecs. Please
> help
> >>> me,
> >>> and the values for the IN clause comes from outside hence cannot
> really
> >>> change the IN clause to a join on an existing table.
> >> Hi Arindam,
> >>
> >> Does the query run faster if you compress all the tables involved, or
> if
> >> you drop and recreate all the indexes? If so, it is likely that the
> >> index cardinality statistics are out of date, which may make the
> >> optimizer pick a bad execution plan. Currently, index cardinality
> >> statistics are only updated at index creation time, when tables are
> >> compressed, and when columns are dropped. A more automatic solution is
> >> being worked on. For more details, see:
> >>
> >> https://issues.apache.org/jira/browse/DERBY-269
> >> https://issues.apache.org/jira/browse/DERBY-3788
> >> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
> >>
> >> You may be experiencing some other problem, but this is a problem that
> >> keeps coming up, so I think it's worth checking.
> >>
> >> Hope this helps,
> >>
> >> --
> >> Knut Anders
> >>
> >>
> >

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Mike Matrigali <mi...@sbcglobal.net>.

It is impossible to say what the performance of the query can be without
knowing exact values of all the values of the IN LIST.  But it is 
possible to get some idea assuming some worst case behavior, and from 
that I am going to guess you will never come close to 100ms with an 
uncached database, on hardware using some sort of standard disk based
hard drive.

I do think the query may go faster with index and query tweeking, but
100ms to an uncached db and non-clustered unique values in that IN list
is never going to go that fast.  Adding up just what is posted it looks
like this is a 1.2 gig db.

You posted the space for the tables and indexes.  The interesting ones
are the big ones.  You have 5 tables or indexes over 1000 pages big.  If 
in the worst case your 1000 value IN list happens to be on 1000 
different pages then Derby is going to need to do at least 1000 i/o's to 
get to them - I usually use back of envelope of max 100 i/o's per second
(even if your disk has specs that say higher rate this I/O is not
going to
get streamed as fast as possible by this query, it is going to ask for
page, process it, do some join work then later ask for another page, ...) :
> CATEGORY_MASTER            0    103    0    0    4096    0
> SQL090406091302600        1    55    0    0    4096    0
> SQL090406091302601        1    160    0    1    4096    0
> SQL090406091302730        1    1    0    1    4096    0
> OBJECT_MASTER            0    10497    0    0    4096    0
> SQL090406091302760        1    5340    0    1    4096    0
> SQL090406091302761        1    16708    0    410    4096    0
> OBJECT_CATEGORY_MAPPING        0    150794    0    0    4096    0
> OBJECT_CATEGORY_MAPPING_INDEX    1    112177    0    57    4096    0

There was work done in 10.3 on IN-LISTS, making them perform more like 
unions,  See DERBY-47.  So if you have a choice of releases I would 
suggest you move to 10.4 and post query plan and results against that.
The basic idea of that change was to allow the
system to do 1 probe into an index for each value in the IN-LIST, before
this change DERBY could only sort the values in the IN list and then 
limit a index scan to the lowest and biggest values in the in list.
So for instance for OBJECT_CATEGORY_MAPPING_INDEX, worst case it might
have to scan 112177 pages to find the 1000 rows, where worst case for
probing would be 1000 page (plus btree parent index pages, but those
are much more likely cached).  The problem is that there is definitely
overhead for probing one at a time, scans go much faster - so there is
a crossover point - ie. I would guess it would likely better to scan all
112177 pages then do 100,000 probes.

arindam.bhattacharjee wrote:
> Hello Knut,
> 
> Thanks for your quick response. This is a sample database which I have
> created just for testing out the performance and has been written to only
> once in one go. I tried temp tables but that is just too slow. The IN clause
> has values which comes from another source and I can't modify that.
> 
> However, I will try out what you state below. But still, I wanted to get
> your pulse about whether Derby can respond in sub 100 millisec time with the
> table sizes you see above? 
> 
> I find that:
> 
> select category_master.category_name, count(category_master.category_name)
> as category_count
> from
> 	(
> 		select internal.object_id
> 		from
> 		(
> 			values(1001) union all
> 			values(1001) union all
> 			values(1001) union all
> 			values(1001) union all
> 			values(1002) union all
> 			values(1001) union all
> 			values(1001) union all
> 			values(1001) union all
> 			values(1001) union all
> 			values(1001) union all
> 			values(1001) union all
> 			values(1001) union all .......
> 			values(9999)
> 		) as internal(object_id)
> 
> 	) as external_ids,
> 	object_master,
> 	category_master,
> 	object_category_mapping
> where
> 	external_ids.object_id = object_master.object_id and
> 	external_ids.object_id = object_category_mapping.object_id and
> 	object_master.object_id = object_category_mapping.object_id and
> 	category_master.category_id = object_category_mapping.category_id
> group by
> 	category_master.category_name
> order by
> 	category_count desc
> 
> is much faster unfortunately connection.prepareStatement() is taking way too
> much memory (both stack and heap - I have a constraint of 256 MB MAX memory
> for my JVM) which goes beyond my applications resources. Is there a way I
> can precompile some SQLs which are very expensive to parse during execution.
> 
> Best regards,
> 
> Arindam.
> 
> 
> Knut Anders Hatlen wrote:
>> "arindam.bhattacharjee" <mr...@gmail.com> writes:
>>
>>> Hello,
>>>
>>> I would like my query below to return within 100 millisecs. Please help
>>> me,
>>> and the values for the IN clause comes from outside hence cannot really
>>> change the IN clause to a join on an existing table.
>> Hi Arindam,
>>
>> Does the query run faster if you compress all the tables involved, or if
>> you drop and recreate all the indexes? If so, it is likely that the
>> index cardinality statistics are out of date, which may make the
>> optimizer pick a bad execution plan. Currently, index cardinality
>> statistics are only updated at index creation time, when tables are
>> compressed, and when columns are dropped. A more automatic solution is
>> being worked on. For more details, see:
>>
>> https://issues.apache.org/jira/browse/DERBY-269
>> https://issues.apache.org/jira/browse/DERBY-3788
>> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
>>
>> You may be experiencing some other problem, but this is a problem that
>> keeps coming up, so I think it's worth checking.
>>
>> Hope this helps,
>>
>> -- 
>> Knut Anders
>>
>>
>

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by "arindam.bhattacharjee" <mr...@gmail.com>.

Hello Knut,

Thanks for your quick response. This is a sample database which I have
created just for testing out the performance and has been written to only
once in one go. I tried temp tables but that is just too slow. The IN clause
has values which comes from another source and I can't modify that.

However, I will try out what you state below. But still, I wanted to get
your pulse about whether Derby can respond in sub 100 millisec time with the
table sizes you see above? 

I find that:

select category_master.category_name, count(category_master.category_name)
as category_count
from
	(
		select internal.object_id
		from
		(
			values(1001) union all
			values(1001) union all
			values(1001) union all
			values(1001) union all
			values(1002) union all
			values(1001) union all
			values(1001) union all
			values(1001) union all
			values(1001) union all
			values(1001) union all
			values(1001) union all
			values(1001) union all .......
			values(9999)
		) as internal(object_id)

	) as external_ids,
	object_master,
	category_master,
	object_category_mapping
where
	external_ids.object_id = object_master.object_id and
	external_ids.object_id = object_category_mapping.object_id and
	object_master.object_id = object_category_mapping.object_id and
	category_master.category_id = object_category_mapping.category_id
group by
	category_master.category_name
order by
	category_count desc

is much faster unfortunately connection.prepareStatement() is taking way too
much memory (both stack and heap - I have a constraint of 256 MB MAX memory
for my JVM) which goes beyond my applications resources. Is there a way I
can precompile some SQLs which are very expensive to parse during execution.

Best regards,

Arindam.


Knut Anders Hatlen wrote:
> 
> "arindam.bhattacharjee" <mr...@gmail.com> writes:
> 
>> Hello,
>>
>> I would like my query below to return within 100 millisecs. Please help
>> me,
>> and the values for the IN clause comes from outside hence cannot really
>> change the IN clause to a join on an existing table.
> 
> Hi Arindam,
> 
> Does the query run faster if you compress all the tables involved, or if
> you drop and recreate all the indexes? If so, it is likely that the
> index cardinality statistics are out of date, which may make the
> optimizer pick a bad execution plan. Currently, index cardinality
> statistics are only updated at index creation time, when tables are
> compressed, and when columns are dropped. A more automatic solution is
> being worked on. For more details, see:
> 
> https://issues.apache.org/jira/browse/DERBY-269
> https://issues.apache.org/jira/browse/DERBY-3788
> http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html
> 
> You may be experiencing some other problem, but this is a problem that
> keeps coming up, so I think it's worth checking.
> 
> Hope this helps,
> 
> -- 
> Knut Anders
> 
> 

-- 
View this message in context: http://www.nabble.com/URGENT%21%21%21-JDBC-SQL-query-taking-long-time-for-large-IN-clause-tp22927332p22930750.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

"arindam.bhattacharjee" <mr...@gmail.com> writes:

> Hello,
>
> I would like my query below to return within 100 millisecs. Please help me,
> and the values for the IN clause comes from outside hence cannot really
> change the IN clause to a join on an existing table.

Hi Arindam,

Does the query run faster if you compress all the tables involved, or if
you drop and recreate all the indexes? If so, it is likely that the
index cardinality statistics are out of date, which may make the
optimizer pick a bad execution plan. Currently, index cardinality
statistics are only updated at index creation time, when tables are
compressed, and when columns are dropped. A more automatic solution is
being worked on. For more details, see:

https://issues.apache.org/jira/browse/DERBY-269
https://issues.apache.org/jira/browse/DERBY-3788
http://db.apache.org/derby/docs/10.4/tuning/ctunstats18908.html

You may be experiencing some other problem, but this is a problem that
keeps coming up, so I think it's worth checking.

Hope this helps,

-- 
Knut Anders

RE: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by de...@segel.com.

Ok.

 

1)       Why are you using 'on commit preserve rows?

2)       Why are you using executeBatch()

3)       What does your query look like when you use the temp table?

 

With respect to #1: You don't need to preserve the rows outside of your
current transaction. You shouldn't be in a transaction, but since you're
doing the inserts in to the temp table you should have something like the
following:

BEGIN WORK:

     Insert data in to temp

     Run query

END WORK;

Clean up and quit.

 

Your data need only persist for the running of your select statement and
when you're done, it should be cleaned up. Otherwise when you run multiple
transactions, you'll have to truncate your temp table each time.

 

With respect to #2: 

            Huh? Ok, why are you running executeBatch()?

Here's the url to the API:
http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Statement.html#executeBatch
()

All you are doing is running the same insert statement multiple times. Not a
batch or block of different SQL statements. 

In an iterated loop, run either execute() or executeUpdate(). If you've
prepared your insert statement this should be fairly quick.

Also how many object_ids are in the list and are they all unique? 

 

With respect to #3:

I have to wonder why your query jumped up to not being completed. 

Something isn't right.

 

 

Since I don't know much about you, I have to ask. is this a school
assignment? 

 

-Mike

 

 

 

  _____  

From: Arindam Bhattacharjee [mailto:mr.arindam.bhattacharjee@gmail.com] 
Sent: Tuesday, April 07, 2009 12:38 PM
To: Derby Discussion; msegel@segel.com
Subject: Re: URGENT!!! JDBC SQL query taking long time for large IN clause

 

I created the indexes one by one just to see whether the performance
improves or not. I don't have much knowledge of databases - I agree. I also
created temp table using the following command:

declare global temporary table temp_object_ids (object_id int) on commit
preserve rows not logged 

Inserting 1000 records took around 800 millis with prepared insert statement
and executeBatch().

I just didn't have the patience to wait for the query to end - I waited for
around 5 mins before killing it - it just kept running endlessly.

I tried removing the object_master from the select clause - it was slower
than the query which I posted. The fastest query till now has been the one
with union - unfortunately, it just hogs too much memory during query
parsing.

Best regards,

Arindam.

On Tue, Apr 7, 2009 at 8:52 PM, <de...@segel.com> wrote:

Wow, this really sounds like a class project.
But since you've asked... lets look at a couple of things.

First your query...
That is a lot of values in the IN Clause. It will hurt performance on pretty
much most databases. Derby had an issue with performance and IN, but I think
they've fixed it as best they could about a year or two ago. (Sorry I'm a
little fuzzy when it comes to time. Its all relative. ;-)

Second, with your query, its better if your format the query so that it's
easier to read. Trust me, it saves a lot of time when you have to look at a
lot of queries and when you have visual cues you can get a better feel for
your query.

But let's take a step back and look at what you have...

In simple terms, you have a basic object table, and a category table which
you want to map. An object can exist in more than one category and you want
to be able to retrieve the objects based on a long list of possible
categories.

Starting with your tables and Indexes...

> CREATE TABLE APP.OBJECT_MASTER (
>     OBJECT_ID INTEGER NOT NULL,
>     OBJECT_CUID VARCHAR(32) NOT NULL,
>     PRIMARY KEY (OBJECT_ID)
> );

Then you create this:

> CREATE INDEX APP.OBJECT_MASTER_INDEX ON APP.OBJECT_MASTER
> (OBJECT_ID,OBJECT_CUID)

Ok, the first question is why? Is it that you want to only hit the index and
not have to read a row from the table? FORGET ABOUT IT! At least for now...

You already have a primary key on object_id and this second key will pretty
much duplicate the table and you're doubling your storage requirements.


> CREATE TABLE APP.OBJECT_CATEGORY_MAPPING (
>     OBJECT_ID INTEGER NOT NULL,
>     CATEGORY_ID INTEGER NOT NULL
> );

> CREATE INDEX APP.OBJECT_CATEGORY_MAPPING_INDEX ON
> APP.OBJECT_CATEGORY_MAPPING (OBJECT_ID,CATEGORY_ID)

Note that since your query is looking at the objects and then wants the
categories, this index is ok.

And finally:

> CREATE TABLE APP.CATEGORY_MASTER (
>     CATEGORY_ID INTEGER NOT NULL,
>     CATEGORY_NAME VARCHAR(255) NOT NULL,
>     PARENT_ID INTEGER,
>     PRIMARY KEY (CATEGORY_ID)
> );

> CREATE INDEX APP.CATEGORY_MASTER_INDEX ON APP.CATEGORY_MASTER
> (CATEGORY_ID,CATEGORY_NAME)

Again this index is redundant and all its really going to do is double your
storage requirements.

Now lets look at your query...
I've reformatted it so that its easier to read:

SELECT  B.category_name,
       COUNT(B.category_name) AS category_count
FROM    object_master A,
       category_master B,
       object_category_mapping C
WHERE   A.object_id = C.object_id
AND     C.category_id = B.category_id
AND     A.object_id IN (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
[SNIP a lot of ?]      )
GROUP BY      category_master.category_name
ORDER BY      category_count DESC

<RANT> One should always try to format their queries in to a readable
pattern. This makes debugging and tuning easier. So take the extra 30
seconds and clean up your code! If you use something like JIndent to format
your java code, cleaning up your query syntax will make maintaining your
code easier. </RANT>


OK, NOW THAT YOU CAN READ YOUR CODE, CAN YOU SEE SOME PROBLEMS?

1) Why do you include your object_master table in your query?

You said that your object_ids are coming from an external source. So why do
you need to join to the object_master table? You already have your
object_ids.

Rewritten, this query should help...
SELECT  B.category_name,
       COUNT(B.category_name) AS category_count
FROM    category_master B,
       object_category_mapping C
WHERE   C.category_id = B.category_id
AND     C.object_id IN (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
[SNIP a lot of ?]      )
GROUP BY      category_master.category_name
ORDER BY      category_count DESC

Note: Now you're joining 2 tables B and C and you're limiting you C based on
a really, really ugly and long IN Clause.

2) There's something you're not telling us.

It looks like you're generating this query on the fly. Because its dynamic,
you're going to incur a bit of overhead in that each time you run the query,
it will have to be prepared and then executed.

Since we don't know much about your application, here's an alternative
suggestion ... Look at using a temp table instead of the large IN clause.

That is, create a temp table at the start of your app, prior to running the
query, you will want to truncate the temp table and then populate it with
the object_ids.

Then your query, which you can now prepare, would look like this...
SELECT  B.category_name,
       COUNT(B.category_name) AS category_count
FROM    category_master B,
       object_category_mapping C
WHERE   C.category_id = B.category_id
AND     C.object_id IN(
       SELECT object_id
       FROM   temp_table)
GROUP BY      B.category_name
ORDER BY      category_count DESC

This should improve the performance of your query.

I would suggest looking at the time it takes to run the query using the IN
CLAUSE versus the time it takes to truncate/populate the temp table and then
run the query.

Getting back to your use of the alternative index, you can add the indexes
afterwards and then compare performances. I don't think that they'll have as
major of an impact as you think.

HTH

-Mike


> -----Original Message-----
> From: arindam.bhattacharjee [mailto:mr.arindam.bhattacharjee@gmail.com]
> Sent: Tuesday, April 07, 2009 6:44 AM
> To: derby-user@db.apache.org
> Subject: URGENT!!! JDBC SQL query taking long time for large IN clause
>
>
> Hello,
>
> I would like my query below to return within 100 millisecs. Please help
> me,
> and the values for the IN clause comes from outside hence cannot really
> change the IN clause to a join on an existing table.
>
> time taken to prepare statement = 0
> time taken execute statement = 33375
> time taken enumerate result set = 63
>
> DDLs:
>
> CREATE TABLE APP.OBJECT_MASTER (
>     OBJECT_ID INTEGER NOT NULL,
>     OBJECT_CUID VARCHAR(32) NOT NULL,
>     PRIMARY KEY (OBJECT_ID)
> );
>
> CREATE TABLE APP.OBJECT_CATEGORY_MAPPING (
>     OBJECT_ID INTEGER NOT NULL,
>     CATEGORY_ID INTEGER NOT NULL
> );
>
> CREATE TABLE APP.CATEGORY_MASTER (
>     CATEGORY_ID INTEGER NOT NULL,
>     CATEGORY_NAME VARCHAR(255) NOT NULL,
>     PARENT_ID INTEGER,
>     PRIMARY KEY (CATEGORY_ID)
> );
>
> CREATE INDEX APP.OBJECT_MASTER_INDEX ON APP.OBJECT_MASTER
> (OBJECT_ID,OBJECT_CUID)
>
> CREATE INDEX APP.OBJECT_CATEGORY_MAPPING_INDEX ON
> APP.OBJECT_CATEGORY_MAPPING (OBJECT_ID,CATEGORY_ID)
>
> CREATE INDEX APP.CATEGORY_MASTER_INDEX ON APP.CATEGORY_MASTER
> (CATEGORY_ID,CATEGORY_NAME)
>
> Data inside the tables:
> OBJECT_MASTER = 1,000,000 rows
> CATEGORY_MASTER = 10,000 rows
> OBJECT_CATEGORY_MAPPING = 20,000,000 rows
>
> SYSCS_GET_RUNTIMESTATISTICS() output:
>
>
> Statement Name:
>     null
> Statement Text:
>      select category_master.category_name,
> count(category_master.category_name) as category_count  from
> object_master,
> category_master,  object_category_mapping  where
> object_master.object_id = object_category_mapping.object_id and
> object_category_mapping.category_id = category_master.category_id and
> object_master.object_id in (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?      )
> group by      category_master.category_name  order by      category_count
> desc
> Parse Time: 94
> Bind Time: 31
> Optimize Time: 125
> Generate Time: 94
> Compile Time: 344
> Execute Time: 33235
> Begin Compilation Timestamp : 2009-04-07 16:03:55.859
> End Compilation Timestamp : 2009-04-07 16:03:56.203
> Begin Execution Timestamp : 2009-04-07 16:03:56.296
> End Execution Timestamp : 2009-04-07 16:05:36.515
> Statement Execution Plan Text:
> Sort ResultSet:
> Number of opens = 1
> Rows input = 1068
> Rows returned = 1068
> Eliminate duplicates = false
> In sorted order = false
> Sort information:
>     Number of rows input=1068
>     Number of rows output=1068
>     Sort type=internal
>     constructor time (milliseconds) = 0
>     open time (milliseconds) = 33235
>     next time (milliseconds) = 0
>     close time (milliseconds) = 0
>     optimizer estimated row count:     18811881.00
>     optimizer estimated cost:     95276340.92
>
> Source result set:
>     Project-Restrict ResultSet (10):
>     Number of opens = 1
>     Rows seen = 1068
>     Rows filtered = 0
>     restriction = false
>     projection = true
>         constructor time (milliseconds) = 0
>         open time (milliseconds) = 33172
>         next time (milliseconds) = 47
>         close time (milliseconds) = 0
>         restriction time (milliseconds) = 0
>         projection time (milliseconds) = 0
>         optimizer estimated row count:     18811881.00
>         optimizer estimated cost:     95276340.92
>
>     Source result set:
>         Grouped Aggregate ResultSet:
>         Number of opens = 1
>         Rows input = 19000
>         Has distinct aggregate = false
>         In sorted order = false
>         Sort information:
>             Number of merge runs=1
>             Number of rows input=19000
>             Number of rows output=1084
>             Size of merge runs=[18220]
>             Sort type=external
>             constructor time (milliseconds) = 0
>             open time (milliseconds) = 33172
>             next time (milliseconds) = 47
>             close time (milliseconds) = 0
>             optimizer estimated row count:     18811881.00
>             optimizer estimated cost:     95276340.92
>
>         Source result set:
>             Project-Restrict ResultSet (9):
>             Number of opens = 1
>             Rows seen = 19000
>             Rows filtered = 0
>             restriction = false
>             projection = true
>                 constructor time (milliseconds) = 0
>                 open time (milliseconds) = 0
>                 next time (milliseconds) = 33001
>                 close time (milliseconds) = 16
>                 restriction time (milliseconds) = 0
>                 projection time (milliseconds) = 16
>                 optimizer estimated row count:     18811881.00
>                 optimizer estimated cost:     95276340.92
>
>             Source result set:
>                 Nested Loop Exists Join ResultSet:
>                 Number of opens = 1
>                 Rows seen from the left = 19000
>                 Rows seen from the right = 19000
>                 Rows filtered = 0
>                 Rows returned = 19000
>                     constructor time (milliseconds) = 0
>                     open time (milliseconds) = 0
>                     next time (milliseconds) = 32954
>                     close time (milliseconds) = 16
>                     optimizer estimated row count:     18811881.00
>                     optimizer estimated cost:     95276340.92
>
>                 Left result set:
>                     Nested Loop Join ResultSet:
>                     Number of opens = 1
>                     Rows seen from the left = 1000
>                     Rows seen from the right = 19000
>                     Rows filtered = 0
>                     Rows returned = 19000
>                         constructor time (milliseconds) = 0
>                         open time (milliseconds) = 0
>                         next time (milliseconds) = 32209
>                         close time (milliseconds) = 16
>                         optimizer estimated row count:     18811881.00
>                         optimizer estimated cost:      4772381.42
>
>                     Left result set:
>                         Project-Restrict ResultSet (5):
>                         Number of opens = 1
>                         Rows seen = 104896
>                         Rows filtered = 103896
>                         restriction = true
>                         projection = false
>                             constructor time (milliseconds) = 0
>                             open time (milliseconds) = 0
>                             next time (milliseconds) = 31769
>                             close time (milliseconds) = 16
>                             restriction time (milliseconds) = 30628
>                             projection time (milliseconds) = 0
>                             optimizer estimated row count:       297031.20
>                             optimizer estimated cost:        63997.96
>
>                         Source result set:
>                             Index Scan ResultSet for OBJECT_MASTER using
> constraint SQL090406104857810 at read committed isolation level using
> share
> row locking chosen by the optimizer
>                             Number of opens = 1
>                             Rows seen = 104896
>                             Rows filtered = 0
>                             Fetch Size = 1
>                                 constructor time (milliseconds) = 0
>                                 open time (milliseconds) = 31
>                                 next time (milliseconds) = 1126
>                                 close time (milliseconds) = 16
>                                 next time in milliseconds/row = 0
>
>                             scan information:
>                                 Bit set of columns fetched={0}
>                                 Number of columns fetched=1
>                                 Number of deleted rows visited=0
>                                 Number of pages visited=566
>                                 Number of rows qualified=104896
>                                 Number of rows visited=104897
>                                 Scan type=btree
>                                 Tree height=3
>                                 start position:
>     >= on first 1 column(s).
>     Ordered null semantics on the following columns:
>
>                                 stop position:
>     > on first 1 column(s).
>     Ordered null semantics on the following columns:
>
>                                 qualifiers:
> None
>                                 optimizer estimated row count:
> 297031.20
>                                 optimizer estimated cost:        63997.96
>
>                     Right result set:
>                         Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
> using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation
> level
> using share row locking chosen by the optimizer
>                         Number of opens = 1000
>                         Rows seen = 19000
>                         Rows filtered = 0
>                         Fetch Size = 1
>                             constructor time (milliseconds) = 0
>                             open time (milliseconds) = 47
>                             next time (milliseconds) = 408
>                             close time (milliseconds) = 16
>                             next time in milliseconds/row = 0
>
>                         scan information:
>                             Bit set of columns fetched={0, 1}
>                             Number of columns fetched=2
>                             Number of deleted rows visited=0
>                             Number of pages visited=4119
>                             Number of rows qualified=19000
>                             Number of rows visited=20000
>                             Scan type=btree
>                             Tree height=4
>                             start position:
>     >= on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             stop position:
>     > on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             qualifiers:
> None
>                             optimizer estimated row count:     18811881.00
>                             optimizer estimated cost:      4708383.46
>
>
>                 Right result set:
>                     Index Row to Base Row ResultSet for CATEGORY_MASTER:
>                     Number of opens = 19000
>                     Rows seen = 19000
>                     Columns accessed from heap = {1}
>                         constructor time (milliseconds) = 0
>                         open time (milliseconds) = 186
>                         next time (milliseconds) = 497
>                         close time (milliseconds) = 0
>                         optimizer estimated row count:     18811881.00
>                         optimizer estimated cost:     90503959.49
>
>                         Index Scan ResultSet for CATEGORY_MASTER using
> constraint SQL090406104857680 at read committed isolation level using
> share
> row locking chosen by the optimizer
>                         Number of opens = 19000
>                         Rows seen = 19000
>                         Rows filtered = 0
>                         Fetch Size = 1
>                             constructor time (milliseconds) = 0
>                             open time (milliseconds) = 186
>                             next time (milliseconds) = 419
>                             close time (milliseconds) = 0
>                             next time in milliseconds/row = 0
>
>                         scan information:
>                             Bit set of columns fetched=All
>                             Number of columns fetched=2
>                             Number of deleted rows visited=0
>                             Number of pages visited=38000
>                             Number of rows qualified=19000
>                             Number of rows visited=19000
>                             Scan type=btree
>                             Tree height=2
>                             start position:
>     >= on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             stop position:
>     > on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             qualifiers:
> None
>                             optimizer estimated row count:     18811881.00
>                             optimizer estimated cost:     90503959.49
>
> Best regards,
>
> Arindam.
> --
> View this message in context: http://www.nabble.com/URGENT%21%21%21-JDBC-
> SQL-query-taking-long-time-for-large-IN-clause-tp22927332p22927332.html
> Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by Arindam Bhattacharjee <mr...@gmail.com>.

I created the indexes one by one just to see whether the performance
improves or not. I don't have much knowledge of databases - I agree. I also
created temp table using the following command:

declare global temporary table temp_object_ids (object_id int) on commit
preserve rows not logged

Inserting 1000 records took around 800 millis with prepared insert statement
and executeBatch().

I just didn't have the patience to wait for the query to end - I waited for
around 5 mins before killing it - it just kept running endlessly.

I tried removing the object_master from the select clause - it was slower
than the query which I posted. The fastest query till now has been the one
with union - unfortunately, it just hogs too much memory during query
parsing.

Best regards,

Arindam.

On Tue, Apr 7, 2009 at 8:52 PM, <de...@segel.com> wrote:

> Wow, this really sounds like a class project.
> But since you've asked... lets look at a couple of things.
>
> First your query...
> That is a lot of values in the IN Clause. It will hurt performance on
> pretty
> much most databases. Derby had an issue with performance and IN, but I
> think
> they've fixed it as best they could about a year or two ago. (Sorry I'm a
> little fuzzy when it comes to time. Its all relative. ;-)
>
> Second, with your query, its better if your format the query so that it's
> easier to read. Trust me, it saves a lot of time when you have to look at a
> lot of queries and when you have visual cues you can get a better feel for
> your query.
>
> But let's take a step back and look at what you have...
>
> In simple terms, you have a basic object table, and a category table which
> you want to map. An object can exist in more than one category and you want
> to be able to retrieve the objects based on a long list of possible
> categories.
>
> Starting with your tables and Indexes...
> > CREATE TABLE APP.OBJECT_MASTER (
> >     OBJECT_ID INTEGER NOT NULL,
> >     OBJECT_CUID VARCHAR(32) NOT NULL,
> >     PRIMARY KEY (OBJECT_ID)
> > );
> Then you create this:
> > CREATE INDEX APP.OBJECT_MASTER_INDEX ON APP.OBJECT_MASTER
> > (OBJECT_ID,OBJECT_CUID)
>
> Ok, the first question is why? Is it that you want to only hit the index
> and
> not have to read a row from the table? FORGET ABOUT IT! At least for now...
>
> You already have a primary key on object_id and this second key will pretty
> much duplicate the table and you're doubling your storage requirements.
>
> > CREATE TABLE APP.OBJECT_CATEGORY_MAPPING (
> >     OBJECT_ID INTEGER NOT NULL,
> >     CATEGORY_ID INTEGER NOT NULL
> > );
> > CREATE INDEX APP.OBJECT_CATEGORY_MAPPING_INDEX ON
> > APP.OBJECT_CATEGORY_MAPPING (OBJECT_ID,CATEGORY_ID)
> Note that since your query is looking at the objects and then wants the
> categories, this index is ok.
>
> And finally:
> > CREATE TABLE APP.CATEGORY_MASTER (
> >     CATEGORY_ID INTEGER NOT NULL,
> >     CATEGORY_NAME VARCHAR(255) NOT NULL,
> >     PARENT_ID INTEGER,
> >     PRIMARY KEY (CATEGORY_ID)
> > );
> > CREATE INDEX APP.CATEGORY_MASTER_INDEX ON APP.CATEGORY_MASTER
> > (CATEGORY_ID,CATEGORY_NAME)
>
> Again this index is redundant and all its really going to do is double your
> storage requirements.
>
> Now lets look at your query...
> I've reformatted it so that its easier to read:
>
> SELECT  B.category_name,
>        COUNT(B.category_name) AS category_count
> FROM    object_master A,
>        category_master B,
>        object_category_mapping C
> WHERE   A.object_id = C.object_id
> AND     C.category_id = B.category_id
> AND     A.object_id IN (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> [SNIP a lot of ?]      )
> GROUP BY      category_master.category_name
> ORDER BY      category_count DESC
>
> <RANT> One should always try to format their queries in to a readable
> pattern. This makes debugging and tuning easier. So take the extra 30
> seconds and clean up your code! If you use something like JIndent to format
> your java code, cleaning up your query syntax will make maintaining your
> code easier. </RANT>
>
>
> OK, NOW THAT YOU CAN READ YOUR CODE, CAN YOU SEE SOME PROBLEMS?
>
> 1) Why do you include your object_master table in your query?
>
> You said that your object_ids are coming from an external source. So why do
> you need to join to the object_master table? You already have your
> object_ids.
>
> Rewritten, this query should help...
> SELECT  B.category_name,
>        COUNT(B.category_name) AS category_count
> FROM    category_master B,
>        object_category_mapping C
> WHERE   C.category_id = B.category_id
> AND     C.object_id IN (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> [SNIP a lot of ?]      )
> GROUP BY      category_master.category_name
> ORDER BY      category_count DESC
>
> Note: Now you're joining 2 tables B and C and you're limiting you C based
> on
> a really, really ugly and long IN Clause.
>
> 2) There's something you're not telling us.
>
> It looks like you're generating this query on the fly. Because its dynamic,
> you're going to incur a bit of overhead in that each time you run the
> query,
> it will have to be prepared and then executed.
>
> Since we don't know much about your application, here's an alternative
> suggestion ... Look at using a temp table instead of the large IN clause.
>
> That is, create a temp table at the start of your app, prior to running the
> query, you will want to truncate the temp table and then populate it with
> the object_ids.
>
> Then your query, which you can now prepare, would look like this...
> SELECT  B.category_name,
>        COUNT(B.category_name) AS category_count
> FROM    category_master B,
>        object_category_mapping C
> WHERE   C.category_id = B.category_id
> AND     C.object_id IN(
>        SELECT object_id
>        FROM   temp_table)
> GROUP BY      B.category_name
> ORDER BY      category_count DESC
>
> This should improve the performance of your query.
>
> I would suggest looking at the time it takes to run the query using the IN
> CLAUSE versus the time it takes to truncate/populate the temp table and
> then
> run the query.
>
> Getting back to your use of the alternative index, you can add the indexes
> afterwards and then compare performances. I don't think that they'll have
> as
> major of an impact as you think.
>
> HTH
>
> -Mike
>
> > -----Original Message-----
> > From: arindam.bhattacharjee [mailto:mr.arindam.bhattacharjee@gmail.com]
> > Sent: Tuesday, April 07, 2009 6:44 AM
> > To: derby-user@db.apache.org
> > Subject: URGENT!!! JDBC SQL query taking long time for large IN clause
> >
> >
> > Hello,
> >
> > I would like my query below to return within 100 millisecs. Please help
> > me,
> > and the values for the IN clause comes from outside hence cannot really
> > change the IN clause to a join on an existing table.
> >
> > time taken to prepare statement = 0
> > time taken execute statement = 33375
> > time taken enumerate result set = 63
> >
> > DDLs:
> >
> > CREATE TABLE APP.OBJECT_MASTER (
> >     OBJECT_ID INTEGER NOT NULL,
> >     OBJECT_CUID VARCHAR(32) NOT NULL,
> >     PRIMARY KEY (OBJECT_ID)
> > );
> >
> > CREATE TABLE APP.OBJECT_CATEGORY_MAPPING (
> >     OBJECT_ID INTEGER NOT NULL,
> >     CATEGORY_ID INTEGER NOT NULL
> > );
> >
> > CREATE TABLE APP.CATEGORY_MASTER (
> >     CATEGORY_ID INTEGER NOT NULL,
> >     CATEGORY_NAME VARCHAR(255) NOT NULL,
> >     PARENT_ID INTEGER,
> >     PRIMARY KEY (CATEGORY_ID)
> > );
> >
> > CREATE INDEX APP.OBJECT_MASTER_INDEX ON APP.OBJECT_MASTER
> > (OBJECT_ID,OBJECT_CUID)
> >
> > CREATE INDEX APP.OBJECT_CATEGORY_MAPPING_INDEX ON
> > APP.OBJECT_CATEGORY_MAPPING (OBJECT_ID,CATEGORY_ID)
> >
> > CREATE INDEX APP.CATEGORY_MASTER_INDEX ON APP.CATEGORY_MASTER
> > (CATEGORY_ID,CATEGORY_NAME)
> >
> > Data inside the tables:
> > OBJECT_MASTER = 1,000,000 rows
> > CATEGORY_MASTER = 10,000 rows
> > OBJECT_CATEGORY_MAPPING = 20,000,000 rows
> >
> > SYSCS_GET_RUNTIMESTATISTICS() output:
> >
> >
> > Statement Name:
> >     null
> > Statement Text:
> >      select category_master.category_name,
> > count(category_master.category_name) as category_count  from
> > object_master,
> > category_master,  object_category_mapping  where
> > object_master.object_id = object_category_mapping.object_id and
> > object_category_mapping.category_id = category_master.category_id and
> > object_master.object_id in (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> > ?
> > ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
>  ?
> > ,
> > ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?      )
> > group by      category_master.category_name  order by      category_count
> > desc
> > Parse Time: 94
> > Bind Time: 31
> > Optimize Time: 125
> > Generate Time: 94
> > Compile Time: 344
> > Execute Time: 33235
> > Begin Compilation Timestamp : 2009-04-07 16:03:55.859
> > End Compilation Timestamp : 2009-04-07 16:03:56.203
> > Begin Execution Timestamp : 2009-04-07 16:03:56.296
> > End Execution Timestamp : 2009-04-07 16:05:36.515
> > Statement Execution Plan Text:
> > Sort ResultSet:
> > Number of opens = 1
> > Rows input = 1068
> > Rows returned = 1068
> > Eliminate duplicates = false
> > In sorted order = false
> > Sort information:
> >     Number of rows input=1068
> >     Number of rows output=1068
> >     Sort type=internal
> >     constructor time (milliseconds) = 0
> >     open time (milliseconds) = 33235
> >     next time (milliseconds) = 0
> >     close time (milliseconds) = 0
> >     optimizer estimated row count:     18811881.00
> >     optimizer estimated cost:     95276340.92
> >
> > Source result set:
> >     Project-Restrict ResultSet (10):
> >     Number of opens = 1
> >     Rows seen = 1068
> >     Rows filtered = 0
> >     restriction = false
> >     projection = true
> >         constructor time (milliseconds) = 0
> >         open time (milliseconds) = 33172
> >         next time (milliseconds) = 47
> >         close time (milliseconds) = 0
> >         restriction time (milliseconds) = 0
> >         projection time (milliseconds) = 0
> >         optimizer estimated row count:     18811881.00
> >         optimizer estimated cost:     95276340.92
> >
> >     Source result set:
> >         Grouped Aggregate ResultSet:
> >         Number of opens = 1
> >         Rows input = 19000
> >         Has distinct aggregate = false
> >         In sorted order = false
> >         Sort information:
> >             Number of merge runs=1
> >             Number of rows input=19000
> >             Number of rows output=1084
> >             Size of merge runs=[18220]
> >             Sort type=external
> >             constructor time (milliseconds) = 0
> >             open time (milliseconds) = 33172
> >             next time (milliseconds) = 47
> >             close time (milliseconds) = 0
> >             optimizer estimated row count:     18811881.00
> >             optimizer estimated cost:     95276340.92
> >
> >         Source result set:
> >             Project-Restrict ResultSet (9):
> >             Number of opens = 1
> >             Rows seen = 19000
> >             Rows filtered = 0
> >             restriction = false
> >             projection = true
> >                 constructor time (milliseconds) = 0
> >                 open time (milliseconds) = 0
> >                 next time (milliseconds) = 33001
> >                 close time (milliseconds) = 16
> >                 restriction time (milliseconds) = 0
> >                 projection time (milliseconds) = 16
> >                 optimizer estimated row count:     18811881.00
> >                 optimizer estimated cost:     95276340.92
> >
> >             Source result set:
> >                 Nested Loop Exists Join ResultSet:
> >                 Number of opens = 1
> >                 Rows seen from the left = 19000
> >                 Rows seen from the right = 19000
> >                 Rows filtered = 0
> >                 Rows returned = 19000
> >                     constructor time (milliseconds) = 0
> >                     open time (milliseconds) = 0
> >                     next time (milliseconds) = 32954
> >                     close time (milliseconds) = 16
> >                     optimizer estimated row count:     18811881.00
> >                     optimizer estimated cost:     95276340.92
> >
> >                 Left result set:
> >                     Nested Loop Join ResultSet:
> >                     Number of opens = 1
> >                     Rows seen from the left = 1000
> >                     Rows seen from the right = 19000
> >                     Rows filtered = 0
> >                     Rows returned = 19000
> >                         constructor time (milliseconds) = 0
> >                         open time (milliseconds) = 0
> >                         next time (milliseconds) = 32209
> >                         close time (milliseconds) = 16
> >                         optimizer estimated row count:     18811881.00
> >                         optimizer estimated cost:      4772381.42
> >
> >                     Left result set:
> >                         Project-Restrict ResultSet (5):
> >                         Number of opens = 1
> >                         Rows seen = 104896
> >                         Rows filtered = 103896
> >                         restriction = true
> >                         projection = false
> >                             constructor time (milliseconds) = 0
> >                             open time (milliseconds) = 0
> >                             next time (milliseconds) = 31769
> >                             close time (milliseconds) = 16
> >                             restriction time (milliseconds) = 30628
> >                             projection time (milliseconds) = 0
> >                             optimizer estimated row count:
> 297031.20
> >                             optimizer estimated cost:        63997.96
> >
> >                         Source result set:
> >                             Index Scan ResultSet for OBJECT_MASTER using
> > constraint SQL090406104857810 at read committed isolation level using
> > share
> > row locking chosen by the optimizer
> >                             Number of opens = 1
> >                             Rows seen = 104896
> >                             Rows filtered = 0
> >                             Fetch Size = 1
> >                                 constructor time (milliseconds) = 0
> >                                 open time (milliseconds) = 31
> >                                 next time (milliseconds) = 1126
> >                                 close time (milliseconds) = 16
> >                                 next time in milliseconds/row = 0
> >
> >                             scan information:
> >                                 Bit set of columns fetched={0}
> >                                 Number of columns fetched=1
> >                                 Number of deleted rows visited=0
> >                                 Number of pages visited=566
> >                                 Number of rows qualified=104896
> >                                 Number of rows visited=104897
> >                                 Scan type=btree
> >                                 Tree height=3
> >                                 start position:
> >     >= on first 1 column(s).
> >     Ordered null semantics on the following columns:
> >
> >                                 stop position:
> >     > on first 1 column(s).
> >     Ordered null semantics on the following columns:
> >
> >                                 qualifiers:
> > None
> >                                 optimizer estimated row count:
> > 297031.20
> >                                 optimizer estimated cost:        63997.96
> >
> >                     Right result set:
> >                         Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
> > using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation
> > level
> > using share row locking chosen by the optimizer
> >                         Number of opens = 1000
> >                         Rows seen = 19000
> >                         Rows filtered = 0
> >                         Fetch Size = 1
> >                             constructor time (milliseconds) = 0
> >                             open time (milliseconds) = 47
> >                             next time (milliseconds) = 408
> >                             close time (milliseconds) = 16
> >                             next time in milliseconds/row = 0
> >
> >                         scan information:
> >                             Bit set of columns fetched={0, 1}
> >                             Number of columns fetched=2
> >                             Number of deleted rows visited=0
> >                             Number of pages visited=4119
> >                             Number of rows qualified=19000
> >                             Number of rows visited=20000
> >                             Scan type=btree
> >                             Tree height=4
> >                             start position:
> >     >= on first 1 column(s).
> >     Ordered null semantics on the following columns:
> > 0
> >                             stop position:
> >     > on first 1 column(s).
> >     Ordered null semantics on the following columns:
> > 0
> >                             qualifiers:
> > None
> >                             optimizer estimated row count:
> 18811881.00
> >                             optimizer estimated cost:      4708383.46
> >
> >
> >                 Right result set:
> >                     Index Row to Base Row ResultSet for CATEGORY_MASTER:
> >                     Number of opens = 19000
> >                     Rows seen = 19000
> >                     Columns accessed from heap = {1}
> >                         constructor time (milliseconds) = 0
> >                         open time (milliseconds) = 186
> >                         next time (milliseconds) = 497
> >                         close time (milliseconds) = 0
> >                         optimizer estimated row count:     18811881.00
> >                         optimizer estimated cost:     90503959.49
> >
> >                         Index Scan ResultSet for CATEGORY_MASTER using
> > constraint SQL090406104857680 at read committed isolation level using
> > share
> > row locking chosen by the optimizer
> >                         Number of opens = 19000
> >                         Rows seen = 19000
> >                         Rows filtered = 0
> >                         Fetch Size = 1
> >                             constructor time (milliseconds) = 0
> >                             open time (milliseconds) = 186
> >                             next time (milliseconds) = 419
> >                             close time (milliseconds) = 0
> >                             next time in milliseconds/row = 0
> >
> >                         scan information:
> >                             Bit set of columns fetched=All
> >                             Number of columns fetched=2
> >                             Number of deleted rows visited=0
> >                             Number of pages visited=38000
> >                             Number of rows qualified=19000
> >                             Number of rows visited=19000
> >                             Scan type=btree
> >                             Tree height=2
> >                             start position:
> >     >= on first 1 column(s).
> >     Ordered null semantics on the following columns:
> > 0
> >                             stop position:
> >     > on first 1 column(s).
> >     Ordered null semantics on the following columns:
> > 0
> >                             qualifiers:
> > None
> >                             optimizer estimated row count:
> 18811881.00
> >                             optimizer estimated cost:     90503959.49
> >
> > Best regards,
> >
> > Arindam.
> > --
> > View this message in context:
> http://www.nabble.com/URGENT%21%21%21-JDBC-
> > SQL-query-taking-long-time-for-large-IN-clause-tp22927332p22927332.html
> > Sent from the Apache Derby Users mailing list archive at Nabble.com.
>
>
>
>

RE: URGENT!!! JDBC SQL query taking long time for large IN clause

Posted by de...@segel.com.

Wow, this really sounds like a class project.
But since you've asked... lets look at a couple of things.

First your query...
That is a lot of values in the IN Clause. It will hurt performance on pretty
much most databases. Derby had an issue with performance and IN, but I think
they've fixed it as best they could about a year or two ago. (Sorry I'm a
little fuzzy when it comes to time. Its all relative. ;-)

Second, with your query, its better if your format the query so that it's
easier to read. Trust me, it saves a lot of time when you have to look at a
lot of queries and when you have visual cues you can get a better feel for
your query.

But let's take a step back and look at what you have...

In simple terms, you have a basic object table, and a category table which
you want to map. An object can exist in more than one category and you want
to be able to retrieve the objects based on a long list of possible
categories.

Starting with your tables and Indexes...
> CREATE TABLE APP.OBJECT_MASTER (
>     OBJECT_ID INTEGER NOT NULL,
>     OBJECT_CUID VARCHAR(32) NOT NULL,
>     PRIMARY KEY (OBJECT_ID)
> );
Then you create this:
> CREATE INDEX APP.OBJECT_MASTER_INDEX ON APP.OBJECT_MASTER
> (OBJECT_ID,OBJECT_CUID)

Ok, the first question is why? Is it that you want to only hit the index and
not have to read a row from the table? FORGET ABOUT IT! At least for now...

You already have a primary key on object_id and this second key will pretty
much duplicate the table and you're doubling your storage requirements. 

> CREATE TABLE APP.OBJECT_CATEGORY_MAPPING (
>     OBJECT_ID INTEGER NOT NULL,
>     CATEGORY_ID INTEGER NOT NULL
> );
> CREATE INDEX APP.OBJECT_CATEGORY_MAPPING_INDEX ON
> APP.OBJECT_CATEGORY_MAPPING (OBJECT_ID,CATEGORY_ID)
Note that since your query is looking at the objects and then wants the
categories, this index is ok.

And finally:
> CREATE TABLE APP.CATEGORY_MASTER (
>     CATEGORY_ID INTEGER NOT NULL,
>     CATEGORY_NAME VARCHAR(255) NOT NULL,
>     PARENT_ID INTEGER,
>     PRIMARY KEY (CATEGORY_ID)
> );
> CREATE INDEX APP.CATEGORY_MASTER_INDEX ON APP.CATEGORY_MASTER
> (CATEGORY_ID,CATEGORY_NAME)

Again this index is redundant and all its really going to do is double your
storage requirements.  

Now lets look at your query...
I've reformatted it so that its easier to read:

SELECT  B.category_name, 
        COUNT(B.category_name) AS category_count  
FROM    object_master A,
        category_master B,  
        object_category_mapping C 
WHERE   A.object_id = C.object_id 
AND     C.category_id = B.category_id 
AND     A.object_id IN (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  
[SNIP a lot of ?]      )
GROUP BY      category_master.category_name  
ORDER BY      category_count DESC

<RANT> One should always try to format their queries in to a readable
pattern. This makes debugging and tuning easier. So take the extra 30
seconds and clean up your code! If you use something like JIndent to format
your java code, cleaning up your query syntax will make maintaining your
code easier. </RANT>


OK, NOW THAT YOU CAN READ YOUR CODE, CAN YOU SEE SOME PROBLEMS?

1) Why do you include your object_master table in your query?

You said that your object_ids are coming from an external source. So why do
you need to join to the object_master table? You already have your
object_ids.

Rewritten, this query should help...
SELECT  B.category_name, 
        COUNT(B.category_name) AS category_count  
FROM    category_master B,  
        object_category_mapping C 
WHERE   C.category_id = B.category_id 
AND     C.object_id IN (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  
[SNIP a lot of ?]      )
GROUP BY      category_master.category_name  
ORDER BY      category_count DESC

Note: Now you're joining 2 tables B and C and you're limiting you C based on
a really, really ugly and long IN Clause.

2) There's something you're not telling us.

It looks like you're generating this query on the fly. Because its dynamic,
you're going to incur a bit of overhead in that each time you run the query,
it will have to be prepared and then executed.

Since we don't know much about your application, here's an alternative
suggestion ... Look at using a temp table instead of the large IN clause.

That is, create a temp table at the start of your app, prior to running the
query, you will want to truncate the temp table and then populate it with
the object_ids.

Then your query, which you can now prepare, would look like this...
SELECT  B.category_name, 
        COUNT(B.category_name) AS category_count  
FROM    category_master B,  
        object_category_mapping C 
WHERE   C.category_id = B.category_id 
AND     C.object_id IN(
        SELECT object_id
        FROM   temp_table)
GROUP BY      B.category_name  
ORDER BY      category_count DESC

This should improve the performance of your query. 

I would suggest looking at the time it takes to run the query using the IN
CLAUSE versus the time it takes to truncate/populate the temp table and then
run the query.

Getting back to your use of the alternative index, you can add the indexes
afterwards and then compare performances. I don't think that they'll have as
major of an impact as you think.

HTH

-Mike

> -----Original Message-----
> From: arindam.bhattacharjee [mailto:mr.arindam.bhattacharjee@gmail.com]
> Sent: Tuesday, April 07, 2009 6:44 AM
> To: derby-user@db.apache.org
> Subject: URGENT!!! JDBC SQL query taking long time for large IN clause
> 
> 
> Hello,
> 
> I would like my query below to return within 100 millisecs. Please help
> me,
> and the values for the IN clause comes from outside hence cannot really
> change the IN clause to a join on an existing table.
> 
> time taken to prepare statement = 0
> time taken execute statement = 33375
> time taken enumerate result set = 63
> 
> DDLs:
> 
> CREATE TABLE APP.OBJECT_MASTER (
>     OBJECT_ID INTEGER NOT NULL,
>     OBJECT_CUID VARCHAR(32) NOT NULL,
>     PRIMARY KEY (OBJECT_ID)
> );
> 
> CREATE TABLE APP.OBJECT_CATEGORY_MAPPING (
>     OBJECT_ID INTEGER NOT NULL,
>     CATEGORY_ID INTEGER NOT NULL
> );
> 
> CREATE TABLE APP.CATEGORY_MASTER (
>     CATEGORY_ID INTEGER NOT NULL,
>     CATEGORY_NAME VARCHAR(255) NOT NULL,
>     PARENT_ID INTEGER,
>     PRIMARY KEY (CATEGORY_ID)
> );
> 
> CREATE INDEX APP.OBJECT_MASTER_INDEX ON APP.OBJECT_MASTER
> (OBJECT_ID,OBJECT_CUID)
> 
> CREATE INDEX APP.OBJECT_CATEGORY_MAPPING_INDEX ON
> APP.OBJECT_CATEGORY_MAPPING (OBJECT_ID,CATEGORY_ID)
> 
> CREATE INDEX APP.CATEGORY_MASTER_INDEX ON APP.CATEGORY_MASTER
> (CATEGORY_ID,CATEGORY_NAME)
> 
> Data inside the tables:
> OBJECT_MASTER = 1,000,000 rows
> CATEGORY_MASTER = 10,000 rows
> OBJECT_CATEGORY_MAPPING = 20,000,000 rows
> 
> SYSCS_GET_RUNTIMESTATISTICS() output:
> 
> 
> Statement Name:
>     null
> Statement Text:
>      select category_master.category_name,
> count(category_master.category_name) as category_count  from
> object_master,
> category_master,  object_category_mapping  where
> object_master.object_id = object_category_mapping.object_id and
> object_category_mapping.category_id = category_master.category_id and
> object_master.object_id in (  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,
> ?
> ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?
> ,
> ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ? ,  ?      )
> group by      category_master.category_name  order by      category_count
> desc
> Parse Time: 94
> Bind Time: 31
> Optimize Time: 125
> Generate Time: 94
> Compile Time: 344
> Execute Time: 33235
> Begin Compilation Timestamp : 2009-04-07 16:03:55.859
> End Compilation Timestamp : 2009-04-07 16:03:56.203
> Begin Execution Timestamp : 2009-04-07 16:03:56.296
> End Execution Timestamp : 2009-04-07 16:05:36.515
> Statement Execution Plan Text:
> Sort ResultSet:
> Number of opens = 1
> Rows input = 1068
> Rows returned = 1068
> Eliminate duplicates = false
> In sorted order = false
> Sort information:
>     Number of rows input=1068
>     Number of rows output=1068
>     Sort type=internal
>     constructor time (milliseconds) = 0
>     open time (milliseconds) = 33235
>     next time (milliseconds) = 0
>     close time (milliseconds) = 0
>     optimizer estimated row count:     18811881.00
>     optimizer estimated cost:     95276340.92
> 
> Source result set:
>     Project-Restrict ResultSet (10):
>     Number of opens = 1
>     Rows seen = 1068
>     Rows filtered = 0
>     restriction = false
>     projection = true
>         constructor time (milliseconds) = 0
>         open time (milliseconds) = 33172
>         next time (milliseconds) = 47
>         close time (milliseconds) = 0
>         restriction time (milliseconds) = 0
>         projection time (milliseconds) = 0
>         optimizer estimated row count:     18811881.00
>         optimizer estimated cost:     95276340.92
> 
>     Source result set:
>         Grouped Aggregate ResultSet:
>         Number of opens = 1
>         Rows input = 19000
>         Has distinct aggregate = false
>         In sorted order = false
>         Sort information:
>             Number of merge runs=1
>             Number of rows input=19000
>             Number of rows output=1084
>             Size of merge runs=[18220]
>             Sort type=external
>             constructor time (milliseconds) = 0
>             open time (milliseconds) = 33172
>             next time (milliseconds) = 47
>             close time (milliseconds) = 0
>             optimizer estimated row count:     18811881.00
>             optimizer estimated cost:     95276340.92
> 
>         Source result set:
>             Project-Restrict ResultSet (9):
>             Number of opens = 1
>             Rows seen = 19000
>             Rows filtered = 0
>             restriction = false
>             projection = true
>                 constructor time (milliseconds) = 0
>                 open time (milliseconds) = 0
>                 next time (milliseconds) = 33001
>                 close time (milliseconds) = 16
>                 restriction time (milliseconds) = 0
>                 projection time (milliseconds) = 16
>                 optimizer estimated row count:     18811881.00
>                 optimizer estimated cost:     95276340.92
> 
>             Source result set:
>                 Nested Loop Exists Join ResultSet:
>                 Number of opens = 1
>                 Rows seen from the left = 19000
>                 Rows seen from the right = 19000
>                 Rows filtered = 0
>                 Rows returned = 19000
>                     constructor time (milliseconds) = 0
>                     open time (milliseconds) = 0
>                     next time (milliseconds) = 32954
>                     close time (milliseconds) = 16
>                     optimizer estimated row count:     18811881.00
>                     optimizer estimated cost:     95276340.92
> 
>                 Left result set:
>                     Nested Loop Join ResultSet:
>                     Number of opens = 1
>                     Rows seen from the left = 1000
>                     Rows seen from the right = 19000
>                     Rows filtered = 0
>                     Rows returned = 19000
>                         constructor time (milliseconds) = 0
>                         open time (milliseconds) = 0
>                         next time (milliseconds) = 32209
>                         close time (milliseconds) = 16
>                         optimizer estimated row count:     18811881.00
>                         optimizer estimated cost:      4772381.42
> 
>                     Left result set:
>                         Project-Restrict ResultSet (5):
>                         Number of opens = 1
>                         Rows seen = 104896
>                         Rows filtered = 103896
>                         restriction = true
>                         projection = false
>                             constructor time (milliseconds) = 0
>                             open time (milliseconds) = 0
>                             next time (milliseconds) = 31769
>                             close time (milliseconds) = 16
>                             restriction time (milliseconds) = 30628
>                             projection time (milliseconds) = 0
>                             optimizer estimated row count:       297031.20
>                             optimizer estimated cost:        63997.96
> 
>                         Source result set:
>                             Index Scan ResultSet for OBJECT_MASTER using
> constraint SQL090406104857810 at read committed isolation level using
> share
> row locking chosen by the optimizer
>                             Number of opens = 1
>                             Rows seen = 104896
>                             Rows filtered = 0
>                             Fetch Size = 1
>                                 constructor time (milliseconds) = 0
>                                 open time (milliseconds) = 31
>                                 next time (milliseconds) = 1126
>                                 close time (milliseconds) = 16
>                                 next time in milliseconds/row = 0
> 
>                             scan information:
>                                 Bit set of columns fetched={0}
>                                 Number of columns fetched=1
>                                 Number of deleted rows visited=0
>                                 Number of pages visited=566
>                                 Number of rows qualified=104896
>                                 Number of rows visited=104897
>                                 Scan type=btree
>                                 Tree height=3
>                                 start position:
>     >= on first 1 column(s).
>     Ordered null semantics on the following columns:
> 
>                                 stop position:
>     > on first 1 column(s).
>     Ordered null semantics on the following columns:
> 
>                                 qualifiers:
> None
>                                 optimizer estimated row count:
> 297031.20
>                                 optimizer estimated cost:        63997.96
> 
>                     Right result set:
>                         Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
> using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation
> level
> using share row locking chosen by the optimizer
>                         Number of opens = 1000
>                         Rows seen = 19000
>                         Rows filtered = 0
>                         Fetch Size = 1
>                             constructor time (milliseconds) = 0
>                             open time (milliseconds) = 47
>                             next time (milliseconds) = 408
>                             close time (milliseconds) = 16
>                             next time in milliseconds/row = 0
> 
>                         scan information:
>                             Bit set of columns fetched={0, 1}
>                             Number of columns fetched=2
>                             Number of deleted rows visited=0
>                             Number of pages visited=4119
>                             Number of rows qualified=19000
>                             Number of rows visited=20000
>                             Scan type=btree
>                             Tree height=4
>                             start position:
>     >= on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             stop position:
>     > on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             qualifiers:
> None
>                             optimizer estimated row count:     18811881.00
>                             optimizer estimated cost:      4708383.46
> 
> 
>                 Right result set:
>                     Index Row to Base Row ResultSet for CATEGORY_MASTER:
>                     Number of opens = 19000
>                     Rows seen = 19000
>                     Columns accessed from heap = {1}
>                         constructor time (milliseconds) = 0
>                         open time (milliseconds) = 186
>                         next time (milliseconds) = 497
>                         close time (milliseconds) = 0
>                         optimizer estimated row count:     18811881.00
>                         optimizer estimated cost:     90503959.49
> 
>                         Index Scan ResultSet for CATEGORY_MASTER using
> constraint SQL090406104857680 at read committed isolation level using
> share
> row locking chosen by the optimizer
>                         Number of opens = 19000
>                         Rows seen = 19000
>                         Rows filtered = 0
>                         Fetch Size = 1
>                             constructor time (milliseconds) = 0
>                             open time (milliseconds) = 186
>                             next time (milliseconds) = 419
>                             close time (milliseconds) = 0
>                             next time in milliseconds/row = 0
> 
>                         scan information:
>                             Bit set of columns fetched=All
>                             Number of columns fetched=2
>                             Number of deleted rows visited=0
>                             Number of pages visited=38000
>                             Number of rows qualified=19000
>                             Number of rows visited=19000
>                             Scan type=btree
>                             Tree height=2
>                             start position:
>     >= on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             stop position:
>     > on first 1 column(s).
>     Ordered null semantics on the following columns:
> 0
>                             qualifiers:
> None
>                             optimizer estimated row count:     18811881.00
>                             optimizer estimated cost:     90503959.49
> 
> Best regards,
> 
> Arindam.
> --
> View this message in context: http://www.nabble.com/URGENT%21%21%21-JDBC-
> SQL-query-taking-long-time-for-large-IN-clause-tp22927332p22927332.html
> Sent from the Apache Derby Users mailing list archive at Nabble.com.