You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "James Synge (JIRA)" <de...@db.apache.org> on 2006/09/05 19:30:25 UTC
[jira] Commented: (DERBY-47) Some possible improvements to IN optimization

    [ http://issues.apache.org/jira/browse/DERBY-47?page=comments#action_12432636 ] 
            
James Synge commented on DERBY-47:
----------------------------------

I spent much of last week tracking down a performance problem in an application that I'm working on, and it turned out to be described here.

I developed a test application (that I will attach) that explores different ways of doing essentially this query:

        SELECT * FROM myTable WHERE foreignKeyColumn IN (?, ..., ?)

We tried the following strategies:

Literals        - 1 query, using WHERE column IN ('literal1', ..., 'literalN')
Literal         - N queries, using WHERE column = 'literal[i]
Markers         - 1 query, using WHERE column IN (?, ..., ?)
Marker          - N queries, using WHERE column = ?
TempTable       - 1 query, store parameters in a temp table, use nested select, then delete parameters
ScratchPad      - 1 query, store parameters in a table, use nested select, then delete parameters
ScrSavepoint    - 1 query, set savepoint, store parameters in a table, use nested select, then rollback savepoint

We were astonished to find that converting the query to:

        SELECT * FROM myTable WHERE foreignKeyColumn = ?

And repeating that query N times was by far the best performer out of 7 different strategies I tried.  (This is what I call the Marker strategy above.)

Here are the results for a table with 100,000 rows in it, and then after that for 1,000,000 rows:
(Note: this table is tab delimited, for easy importing into Excel.  I'll also attach this data.)

	Literals		Literal		Markers		Marker		TempTable		ScratchPad		ScrSavepoint		
ID Count	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms
1	20	20	10	10	10	10	0	0	1232	1232	1132	1132	1022	1022
2	881	440	20	10	450	225	0	0	1022	511	1041	520	1042	521
3	1051	350	30	10	2794	931	0	0	1022	340	1022	340	1012	337
4	1012	253	40	10	2013	503	0	0	1032	258	1002	250	1202	300
5	1132	226	40	8	2053	410	0	0	1032	206	1022	204	1042	208
6	1042	173	50	8	1523	253	0	0	1012	168	1022	170	1051	175
7	1132	161	60	8	3145	449	0	0	1022	146	1032	147	1112	158
8	1102	137	60	7	3034	379	10	1	1062	132	1202	150	1082	135
9	1102	122	60	6	2965	329	0	0	1142	126	1151	127	992	110
10	1112	111	70	7	3526	352	0	0	1022	102	1052	105	1062	106
20	1142	57	120	6	3746	187	10	0	1022	51	1112	55	1232	61
30	1317	43	195	6	4117	137	10	0	1022	34	1082	36	1072	35
40	1252	31	250	6	4417	110	20	0	1022	25	1091	27	1282	32
50	1292	25	320	6	4777	95	20	0	1062	21	1052	21	1052	21
60	1327	22	350	5	5068	84	20	0	1062	17	1082	18	1112	18
70	1332	19	415	5	5504	78	30	0	1042	14	1142	16	1081	15
80	1327	16	471	5	5769	72	40	0	1041	13	1052	13	1277	15
90	1362	15	481	5	6330	70	40	0	1052	11	1152	12	1092	12
100	1372	13	536	5	6460	64	40	0	1283	12	1092	10	1202	12


=============================================================================================================================

	Literals		Literal		Markers		Marker		TempTable		ScratchPad		ScrSavepoint		
ID Count	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms	Total ms	Avg ms
1	160	160	70	70	40	40	41	41	69841	69841	44261	44261	36699	36699
2	44120	22060	181	90	222124	111062	0	0	8624	4312	6851	3425	6580	3290
3	10958	3652	120	40	12540	4180	10	3	6461	2153	6431	2143	6520	2173



> Some possible improvements to IN optimization
> ---------------------------------------------
>
>                 Key: DERBY-47
>                 URL: http://issues.apache.org/jira/browse/DERBY-47
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.0.2.0
>         Environment: all
>            Reporter: Sunitha Kambhampati
>         Attachments: QueryPlanUniqueIndexAndWordIndexOneTerm.txt, QueryPlanUniqueIndexAndWordIndexTwoTerms.txt, QueryPlanUniqueIndexOnlyOneTerm.txt, QueryPlanUniqueIndexOnlyTwoTerms.txt
>
>
> Consider a simple case of  - 
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is 
>  select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has only two values and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN list to get the start and stop values for the scan. Thus the range of the values in the "IN" list here plays an important role. 
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an index scan would be chosen.
> It would be nice if we could do something clever in this case where there is clearly an index on the field and the number of values in the IN list is known. Maybe use the rowcount estimate and the IN list size to do some optimizations.  
> - consider the length of the "IN" list to do searches on the table.  ie use the IN list values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a temporary table and do a join. It is most likely that the optimizer will choose the table with "IN" list here as the outer table in the join and thus will do key searches on the larger table. 
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for some similar queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from scanfixed where i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
> 	constructor time (milliseconds) = 0
> 	open time (milliseconds) = 0
> 	next time (milliseconds) = 0
> 	close time (milliseconds) = 0
> 	restriction time (milliseconds) = 0
> 	projection time (milliseconds) = 0
> 	optimizer estimated row count:          750.38
> 	optimizer estimated cost:         8579.46
> Source result set:
> 	Table Scan ResultSet for SCANFIXED at read committed isolation level using instantaneous share row locking chosen by the optimizer
> 	Number of opens = 1
> 	Rows seen = 10000
> 	Rows filtered = 0
> 	Fetch Size = 16
> 		constructor time (milliseconds) = 0
> 		open time (milliseconds) = 0
> 		next time (milliseconds) = 0
> 		close time (milliseconds) = 0
> 		next time in milliseconds/row = 0
> 	scan information: 
> 		Bit set of columns fetched=All
> 		Number of columns fetched=9
> 		Number of pages visited=417
> 		Number of rows qualified=10000
> 		Number of rows visited=10000
> 		Scan type=heap
> 		start position: 
> null		stop position: 
> null		qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> 		optimizer estimated row count:          750.38
> 		optimizer estimated cost:         8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID = 0), select * from scanfixed where i1 in (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
> 	constructor time (milliseconds) = 0
> 	open time (milliseconds) = 0
> 	next time (milliseconds) = 0
> 	close time (milliseconds) = 0
> 	restriction time (milliseconds) = 0
> 	projection time (milliseconds) = 0
> 	optimizer estimated row count:            4.80
> 	optimizer estimated cost:           39.53
> Source result set:
> 	Index Row to Base Row ResultSet for SCANFIXED:
> 	Number of opens = 1
> 	Rows seen = 10
> 	Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
> 		constructor time (milliseconds) = 0
> 		open time (milliseconds) = 0
> 		next time (milliseconds) = 0
> 		close time (milliseconds) = 0
> 		optimizer estimated row count:            4.80
> 		optimizer estimated cost:           39.53
> 		Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at read committed isolation level using instantaneous share row locking chosen by the optimizer
> 		Number of opens = 1
> 		Rows seen = 10
> 		Rows filtered = 0
> 		Fetch Size = 16
> 			constructor time (milliseconds) = 0
> 			open time (milliseconds) = 0
> 			next time (milliseconds) = 0
> 			close time (milliseconds) = 0
> 			next time in milliseconds/row = 0
> 		scan information: 
> 			Bit set of columns fetched=All
> 			Number of columns fetched=2
> 			Number of deleted rows visited=0
> 			Number of pages visited=2
> 			Number of rows qualified=10
> 			Number of rows visited=10
> 			Scan type=btree
> 			Tree height=2
> 			start position: 
> 	>= on first 1 column(s).
> 	Ordered null semantics on the following columns: 
> 			stop position: 
> 	> on first 1 column(s).
> 	Ordered null semantics on the following columns: 
> 			qualifiers:
> None
> 			optimizer estimated row count:            4.80
> 			optimizer estimated cost:           39.53

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira