You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "James Synge (JIRA)" <de...@db.apache.org> on 2006/09/05 19:30:25 UTC
[jira] Commented: (DERBY-47) Some possible improvements to IN
optimization
[ http://issues.apache.org/jira/browse/DERBY-47?page=comments#action_12432636 ]
James Synge commented on DERBY-47:
----------------------------------
I spent much of last week tracking down a performance problem in an application that I'm working on, and it turned out to be described here.
I developed a test application (that I will attach) that explores different ways of doing essentially this query:
SELECT * FROM myTable WHERE foreignKeyColumn IN (?, ..., ?)
We tried the following strategies:
Literals - 1 query, using WHERE column IN ('literal1', ..., 'literalN')
Literal - N queries, using WHERE column = 'literal[i]
Markers - 1 query, using WHERE column IN (?, ..., ?)
Marker - N queries, using WHERE column = ?
TempTable - 1 query, store parameters in a temp table, use nested select, then delete parameters
ScratchPad - 1 query, store parameters in a table, use nested select, then delete parameters
ScrSavepoint - 1 query, set savepoint, store parameters in a table, use nested select, then rollback savepoint
We were astonished to find that converting the query to:
SELECT * FROM myTable WHERE foreignKeyColumn = ?
And repeating that query N times was by far the best performer out of 7 different strategies I tried. (This is what I call the Marker strategy above.)
Here are the results for a table with 100,000 rows in it, and then after that for 1,000,000 rows:
(Note: this table is tab delimited, for easy importing into Excel. I'll also attach this data.)
Literals Literal Markers Marker TempTable ScratchPad ScrSavepoint
ID Count Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms
1 20 20 10 10 10 10 0 0 1232 1232 1132 1132 1022 1022
2 881 440 20 10 450 225 0 0 1022 511 1041 520 1042 521
3 1051 350 30 10 2794 931 0 0 1022 340 1022 340 1012 337
4 1012 253 40 10 2013 503 0 0 1032 258 1002 250 1202 300
5 1132 226 40 8 2053 410 0 0 1032 206 1022 204 1042 208
6 1042 173 50 8 1523 253 0 0 1012 168 1022 170 1051 175
7 1132 161 60 8 3145 449 0 0 1022 146 1032 147 1112 158
8 1102 137 60 7 3034 379 10 1 1062 132 1202 150 1082 135
9 1102 122 60 6 2965 329 0 0 1142 126 1151 127 992 110
10 1112 111 70 7 3526 352 0 0 1022 102 1052 105 1062 106
20 1142 57 120 6 3746 187 10 0 1022 51 1112 55 1232 61
30 1317 43 195 6 4117 137 10 0 1022 34 1082 36 1072 35
40 1252 31 250 6 4417 110 20 0 1022 25 1091 27 1282 32
50 1292 25 320 6 4777 95 20 0 1062 21 1052 21 1052 21
60 1327 22 350 5 5068 84 20 0 1062 17 1082 18 1112 18
70 1332 19 415 5 5504 78 30 0 1042 14 1142 16 1081 15
80 1327 16 471 5 5769 72 40 0 1041 13 1052 13 1277 15
90 1362 15 481 5 6330 70 40 0 1052 11 1152 12 1092 12
100 1372 13 536 5 6460 64 40 0 1283 12 1092 10 1202 12
=============================================================================================================================
Literals Literal Markers Marker TempTable ScratchPad ScrSavepoint
ID Count Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms Total ms Avg ms
1 160 160 70 70 40 40 41 41 69841 69841 44261 44261 36699 36699
2 44120 22060 181 90 222124 111062 0 0 8624 4312 6851 3425 6580 3290
3 10958 3652 120 40 12540 4180 10 3 6461 2153 6431 2143 6520 2173
> Some possible improvements to IN optimization
> ---------------------------------------------
>
> Key: DERBY-47
> URL: http://issues.apache.org/jira/browse/DERBY-47
> Project: Derby
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 10.0.2.0
> Environment: all
> Reporter: Sunitha Kambhampati
> Attachments: QueryPlanUniqueIndexAndWordIndexOneTerm.txt, QueryPlanUniqueIndexAndWordIndexTwoTerms.txt, QueryPlanUniqueIndexOnlyOneTerm.txt, QueryPlanUniqueIndexOnlyTwoTerms.txt
>
>
> Consider a simple case of -
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is
> select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has only two values and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN list to get the start and stop values for the scan. Thus the range of the values in the "IN" list here plays an important role.
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an index scan would be chosen.
> It would be nice if we could do something clever in this case where there is clearly an index on the field and the number of values in the IN list is known. Maybe use the rowcount estimate and the IN list size to do some optimizations.
> - consider the length of the "IN" list to do searches on the table. ie use the IN list values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a temporary table and do a join. It is most likely that the optimizer will choose the table with "IN" list here as the outer table in the join and thus will do key searches on the larger table.
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for some similar queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from scanfixed where i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 750.38
> optimizer estimated cost: 8579.46
> Source result set:
> Table Scan ResultSet for SCANFIXED at read committed isolation level using instantaneous share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 0
> Fetch Size = 16
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=9
> Number of pages visited=417
> Number of rows qualified=10000
> Number of rows visited=10000
> Scan type=heap
> start position:
> null stop position:
> null qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> optimizer estimated row count: 750.38
> optimizer estimated cost: 8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID = 0), select * from scanfixed where i1 in (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 4.80
> optimizer estimated cost: 39.53
> Source result set:
> Index Row to Base Row ResultSet for SCANFIXED:
> Number of opens = 1
> Rows seen = 10
> Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 4.80
> optimizer estimated cost: 39.53
> Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at read committed isolation level using instantaneous share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> Fetch Size = 16
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=2
> Number of deleted rows visited=0
> Number of pages visited=2
> Number of rows qualified=10
> Number of rows visited=10
> Scan type=btree
> Tree height=2
> start position:
> >= on first 1 column(s).
> Ordered null semantics on the following columns:
> stop position:
> > on first 1 column(s).
> Ordered null semantics on the following columns:
> qualifiers:
> None
> optimizer estimated row count: 4.80
> optimizer estimated cost: 39.53
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira