You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Kevin Hore <kh...@araxis.com> on 2005/11/11 15:59:51 UTC
Poor query optimizer choices is making Derby unusable for large tables
The Derby query optimizer (this is with 10.1.1.0) decides to perform an
expensive table scan in certain circumstances, when it could make use of
available indexes. The resulting poor performance is rendering Derby
useless for our application.
I believe that this behaviour might be related to DERBY-47
(http://issues.apache.org/jira/browse/DERBY-47), which was opened in
October 2004. However, this seems such a severe and fundamental problem
that I'm surprised firstly that a significant portion of the Derby user
base isn't affected by this and, secondly, that DERBY-47 hasn't been
considered a critical defect to be fixed with the utmost urgency.
I've described the problem in detail below, and I'd appreciate any
assistance. Specifically:
i) Does anyone have any plans to fix this problem?
ii) In the meantime, are there any work-arounds? I’d appreciate any
suggestions that would decrease the execution time of my second query
below (the one with with two search terms). Likewise, any general
strategies for avoiding this problem with IN clauses would be appreciated.
----PROBLEM DESCRIPTION----
Consider the table:
CREATE TABLE tblSearchDictionary
(
ObjectId int NOT NULL,
ObjectType int NOT NULL,
Word VARCHAR(64) NOT NULL,
WordLocation int NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
);
This table has an index on each of the four columns, it also has the
unique index across all four columns as defined above:
CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
(ObjectType);
CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
(WordLocation);
The table contains about 260,000 rows.
The following query selects all rows that match instances of string in
the Word column. It sums the WordLocation column having grouped by the
ObjectId column.
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
GROUP BY ObjectId;
On my machine this will usually complete in an acceptable time of around
200ms.
Now consider the following query which adds a second search term on the
same column.
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;
This second query usually takes around 10000ms on my machine. My
understanding from the Derby optimizer docs and DERBY-47 is that this is
because Derby is re-writing the query along the following lines, and
then choosing to do a table scan:
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
Word IN ('CONTACT', 'ADD')
AND Word >= 'ADD'
AND Word <= 'CONTACT'
GROUP BY ObjectId;
The plan for the first query indicates that the tblSearchDictionaryWord
index is used to perform an index scan. However, the plan for the second
query indicates that the majority of the additional time is taken
performing a table scan over the entire table, instead of making use of
the indexes available. Our application uses IN quite frequently, so this
optimizer behaviour would seem to present a significant problem.
---QUERY PLAN FOR FIRST QUERY----
Statement Name:
null
Statement Text:
SELECT
ObjectId,
SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
Word = 'CONTACT'
GROUP BY
ObjectId
Parse Time: 0
Bind Time: 0
Optimize Time: 16
Generate Time: 0
Compile Time: 16
Execute Time: 0
Begin Compilation Timestamp : 2005-11-11 12:28:52.765
End Compilation Timestamp : 2005-11-11 12:28:52.781
Begin Execution Timestamp : 2005-11-11 13:08:15.406
End Execution Timestamp : 2005-11-11 13:08:15.406
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 93
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 226.00
Source result set:
Grouped Aggregate ResultSet:
Number of opens = 1
Rows input = 113
Has distinct aggregate = false
In sorted order = false
Sort information:
Number of rows input=113
Number of rows output=93
Sort type=internal
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 226.00
Source result set:
Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 113
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 118.00
optimizer estimated cost: 226.00
Source result set:
Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
Number of opens = 1
Rows seen = 113
Columns accessed from heap = {0, 3}
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 118.00
optimizer estimated cost: 226.00
Index Scan ResultSet for TBLSEARCHDICTIONARY using index
TBLSEARCHDICTIONARYWORD at read committed isolation level using share
row locking chosen by the optimizer
Number of opens = 1
Rows seen = 113
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched=All
Number of columns fetched=2
Number of deleted rows visited=0
Number of pages visited=4
Number of rows qualified=113
Number of rows visited=114
Scan type=btree
Tree height=3
start position:
>= on first 1 column(s).
Ordered null semantics on the following columns:
0
stop position:
> on first 1 column(s).
Ordered null semantics on the following columns:
0
qualifiers:
None
optimizer estimated row count: 118.00
optimizer estimated cost: 226.00
---QUERY PLAN FOR SECOND QUERY----
Statement Name:
null
Statement Text:
SELECT
ObjectId,
SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE
Word = 'CONTACT' OR Word = 'ADD'
GROUP BY
ObjectId
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 15
Compile Time: 15
Execute Time: 4250
Begin Compilation Timestamp : 2005-11-11 13:16:17.578
End Compilation Timestamp : 2005-11-11 13:16:17.593
Begin Execution Timestamp : 2005-11-11 13:16:17.593
End Execution Timestamp : 2005-11-11 13:16:27.437
Statement Execution Plan Text:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 100
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 4250
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 82959.49
Source result set:
Grouped Aggregate ResultSet:
Number of opens = 1
Rows input = 712
Has distinct aggregate = false
In sorted order = false
Sort information:
Number of rows input=712
Number of rows output=593
Sort type=internal
constructor time (milliseconds) = 0
open time (milliseconds) = 4250
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 82959.49
Source result set:
Project-Restrict ResultSet (4):
Number of opens = 1
Rows seen = 712
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4219
close time (milliseconds) = 15
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 19200.45
optimizer estimated cost: 82959.49
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 40806
Rows filtered = 40094
restriction = true
projection = false
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4219
close time (milliseconds) = 15
restriction time (milliseconds) = 124
projection time (milliseconds) = 0
optimizer estimated row count: 19200.45
optimizer estimated cost: 82959.49
Source result set:
Table Scan ResultSet for TBLSEARCHDICTIONARY at read
committed
isolation level using share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 40806
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4001
close time (milliseconds) = 15
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 2, 3}
Number of columns fetched=3
Number of pages visited=2978
Number of rows qualified=40806
Number of rows visited=256001
Scan type=heap
start position:
null stop position:
null qualifiers:
Column[0][0] Id: 2
Operator: <
Ordered nulls: false
Unknown return value: true
Negate comparison result: true
Column[0][1] Id: 2
Operator: <=
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
optimizer estimated row count: 19200.45
optimizer estimated cost: 82959.49
----------
Thanks in advance for any help!
Kind regards,
Kevin Hore
Re: Poor query optimizer choices is making Derby unusable for large
tables
Posted by Kevin Hore <kh...@araxis.com>.
Hi Satheesh
Satheesh Bandaram wrote:
> Hi Kevin,
>
> Kevin Hore wrote:
>
>
>>i) Does anyone have any plans to fix this problem?
>
>
> Can you file an enhancement request for this? I think Derby could
> improve it's handling of OR/IN clauses. Many databases don't optimize OR
> clauses as much as possible, though some do better than others. It would
> be great if Derby could internally process this as two different scans
> (one for 'CONTACT' and another for 'ADD') and then combine the results.
> Some databases can do this. However, the workaround suggested by Jeff L.
> does this, though you have to rewrite the query.
I've commented on the re-write solution elsewhere. Regarding an
enhancement, I think DERBY-47 pretty much covers the problem, but I'll
add a comment to that describing my circumstances.
>
> Satheesh
>
>
>>ii) In the meantime, are there any work-arounds? I’d appreciate any
>>suggestions that would decrease the execution time of my second query
>>below (the one with with two search terms). Likewise, any general
>>strategies for avoiding this problem with IN clauses would be
>>appreciated.
>>
>>
>>----PROBLEM DESCRIPTION----
>>
>>Consider the table:
>>
>>CREATE TABLE tblSearchDictionary
>>(
>>ObjectId int NOT NULL,
>>ObjectType int NOT NULL,
>>Word VARCHAR(64) NOT NULL,
>>WordLocation int NOT NULL,
>>CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>>);
>>
>>This table has an index on each of the four columns, it also has the
>>unique index across all four columns as defined above:
>>
>>CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
>>(ObjectId);
>>CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>>(ObjectType);
>>CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>>CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>>(WordLocation);
>>
>>The table contains about 260,000 rows.
>>
>>The following query selects all rows that match instances of string in
>>the Word column. It sums the WordLocation column having grouped by the
>>ObjectId column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT'
>>GROUP BY ObjectId;
>>
>>On my machine this will usually complete in an acceptable time of
>>around 200ms.
>>
>>Now consider the following query which adds a second search term on
>>the same column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY ObjectId;
>>
>>This second query usually takes around 10000ms on my machine. My
>>understanding from the Derby optimizer docs and DERBY-47 is that this
>>is because Derby is re-writing the query along the following lines,
>>and then choosing to do a table scan:
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>> Word IN ('CONTACT', 'ADD')
>> AND Word >= 'ADD'
>> AND Word <= 'CONTACT'
>>GROUP BY ObjectId;
>>
>>The plan for the first query indicates that the tblSearchDictionaryWord
>>index is used to perform an index scan. However, the plan for the second
>>query indicates that the majority of the additional time is taken
>>performing a table scan over the entire table, instead of making use of
>>the indexes available. Our application uses IN quite frequently, so
>>this optimizer behaviour would seem to present a significant problem.
>>
>>---QUERY PLAN FOR FIRST QUERY----
>>
>>Statement Name:
>> null
>>Statement Text:
>> SELECT
>> ObjectId,
>> SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>> Word = 'CONTACT'
>>GROUP BY
>> ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 16
>>Generate Time: 0
>>Compile Time: 16
>>Execute Time: 0
>>Begin Compilation Timestamp : 2005-11-11 12:28:52.765
>>End Compilation Timestamp : 2005-11-11 12:28:52.781
>>Begin Execution Timestamp : 2005-11-11 13:08:15.406
>>End Execution Timestamp : 2005-11-11 13:08:15.406
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 93
>>Rows filtered = 0
>>restriction = false
>>projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 226.00
>>
>>Source result set:
>> Grouped Aggregate ResultSet:
>> Number of opens = 1
>> Rows input = 113
>> Has distinct aggregate = false
>> In sorted order = false
>> Sort information:
>> Number of rows input=113
>> Number of rows output=93
>> Sort type=internal
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 226.00
>>
>> Source result set:
>> Project-Restrict ResultSet (4):
>> Number of opens = 1
>> Rows seen = 113
>> Rows filtered = 0
>> restriction = false
>> projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 118.00
>> optimizer estimated cost: 226.00
>>
>> Source result set:
>> Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>> Number of opens = 1
>> Rows seen = 113
>> Columns accessed from heap = {0, 3}
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> optimizer estimated row count: 118.00
>> optimizer estimated cost: 226.00
>>
>> Index Scan ResultSet for TBLSEARCHDICTIONARY using index
>>TBLSEARCHDICTIONARYWORD at read committed isolation level using share
>>row locking chosen by the optimizer
>> Number of opens = 1
>> Rows seen = 113
>> Rows filtered = 0
>> Fetch Size = 1
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> next time in milliseconds/row = 0
>>
>> scan information:
>> Bit set of columns fetched=All
>> Number of columns fetched=2
>> Number of deleted rows visited=0
>> Number of pages visited=4
>> Number of rows qualified=113
>> Number of rows visited=114
>> Scan type=btree
>> Tree height=3
>> start position:
>> >= on first 1 column(s).
>> Ordered null semantics on the following columns:
>>0
>> stop position:
>> > on first 1 column(s).
>> Ordered null semantics on the following columns:
>>0
>> qualifiers:
>>None
>> optimizer estimated row count: 118.00
>> optimizer estimated cost: 226.00
>>
>>
>>---QUERY PLAN FOR SECOND QUERY----
>>
>>Statement Name:
>> null
>>Statement Text:
>> SELECT
>> ObjectId,
>> SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>> Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY
>> ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 0
>>Generate Time: 15
>>Compile Time: 15
>>Execute Time: 4250
>>Begin Compilation Timestamp : 2005-11-11 13:16:17.578
>>End Compilation Timestamp : 2005-11-11 13:16:17.593
>>Begin Execution Timestamp : 2005-11-11 13:16:17.593
>>End Execution Timestamp : 2005-11-11 13:16:27.437
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 100
>>Rows filtered = 0
>>restriction = false
>>projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 4250
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 82959.49
>>
>>Source result set:
>> Grouped Aggregate ResultSet:
>> Number of opens = 1
>> Rows input = 712
>> Has distinct aggregate = false
>> In sorted order = false
>> Sort information:
>> Number of rows input=712
>> Number of rows output=593
>> Sort type=internal
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 4250
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 82959.49
>>
>> Source result set:
>> Project-Restrict ResultSet (4):
>> Number of opens = 1
>> Rows seen = 712
>> Rows filtered = 0
>> restriction = false
>> projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 4219
>> close time (milliseconds) = 15
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 19200.45
>> optimizer estimated cost: 82959.49
>>
>> Source result set:
>> Project-Restrict ResultSet (3):
>> Number of opens = 1
>> Rows seen = 40806
>> Rows filtered = 40094
>> restriction = true
>> projection = false
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 4219
>> close time (milliseconds) = 15
>> restriction time (milliseconds) = 124
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 19200.45
>> optimizer estimated cost: 82959.49
>>
>> Source result set:
>> Table Scan ResultSet for TBLSEARCHDICTIONARY at read
>>committed
>>isolation level using share row locking chosen by the optimizer
>> Number of opens = 1
>> Rows seen = 40806
>> Rows filtered = 0
>> Fetch Size = 1
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 4001
>> close time (milliseconds) = 15
>> next time in milliseconds/row = 0
>>
>> scan information:
>> Bit set of columns fetched={0, 2, 3}
>> Number of columns fetched=3
>> Number of pages visited=2978
>> Number of rows qualified=40806
>> Number of rows visited=256001
>> Scan type=heap
>> start position:
>>null stop position:
>>null qualifiers:
>>Column[0][0] Id: 2
>>Operator: <
>>Ordered nulls: false
>>Unknown return value: true
>>Negate comparison result: true
>>Column[0][1] Id: 2
>>Operator: <=
>>Ordered nulls: false
>>Unknown return value: false
>>Negate comparison result: false
>>
>> optimizer estimated row count: 19200.45
>> optimizer estimated cost: 82959.49
>>
>>----------
>>
>>Thanks in advance for any help!
>>
>>Kind regards,
>>
>>
>>Kevin Hore
>>
>>
>>
>
>
Re: Poor query optimizer choices is making Derby unusable for large
tables
Posted by Satheesh Bandaram <sa...@Sourcery.Org>.
Hi Kevin,
Kevin Hore wrote:
> i) Does anyone have any plans to fix this problem?
Can you file an enhancement request for this? I think Derby could
improve it's handling of OR/IN clauses. Many databases don't optimize OR
clauses as much as possible, though some do better than others. It would
be great if Derby could internally process this as two different scans
(one for 'CONTACT' and another for 'ADD') and then combine the results.
Some databases can do this. However, the workaround suggested by Jeff L.
does this, though you have to rewrite the query.
Satheesh
> ii) In the meantime, are there any work-arounds? I’d appreciate any
> suggestions that would decrease the execution time of my second query
> below (the one with with two search terms). Likewise, any general
> strategies for avoiding this problem with IN clauses would be
> appreciated.
>
>
> ----PROBLEM DESCRIPTION----
>
> Consider the table:
>
> CREATE TABLE tblSearchDictionary
> (
> ObjectId int NOT NULL,
> ObjectType int NOT NULL,
> Word VARCHAR(64) NOT NULL,
> WordLocation int NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
> );
>
> This table has an index on each of the four columns, it also has the
> unique index across all four columns as defined above:
>
> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
> (ObjectId);
> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
> (ObjectType);
> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
> (WordLocation);
>
> The table contains about 260,000 rows.
>
> The following query selects all rows that match instances of string in
> the Word column. It sums the WordLocation column having grouped by the
> ObjectId column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> GROUP BY ObjectId;
>
> On my machine this will usually complete in an acceptable time of
> around 200ms.
>
> Now consider the following query which adds a second search term on
> the same column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;
>
> This second query usually takes around 10000ms on my machine. My
> understanding from the Derby optimizer docs and DERBY-47 is that this
> is because Derby is re-writing the query along the following lines,
> and then choosing to do a table scan:
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word IN ('CONTACT', 'ADD')
> AND Word >= 'ADD'
> AND Word <= 'CONTACT'
> GROUP BY ObjectId;
>
> The plan for the first query indicates that the tblSearchDictionaryWord
> index is used to perform an index scan. However, the plan for the second
> query indicates that the majority of the additional time is taken
> performing a table scan over the entire table, instead of making use of
> the indexes available. Our application uses IN quite frequently, so
> this optimizer behaviour would seem to present a significant problem.
>
> ---QUERY PLAN FOR FIRST QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 16
> Generate Time: 0
> Compile Time: 16
> Execute Time: 0
> Begin Compilation Timestamp : 2005-11-11 12:28:52.765
> End Compilation Timestamp : 2005-11-11 12:28:52.781
> Begin Execution Timestamp : 2005-11-11 13:08:15.406
> End Execution Timestamp : 2005-11-11 13:08:15.406
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 93
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 113
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=113
> Number of rows output=93
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
> Number of opens = 1
> Rows seen = 113
> Columns accessed from heap = {0, 3}
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Index Scan ResultSet for TBLSEARCHDICTIONARY using index
> TBLSEARCHDICTIONARYWORD at read committed isolation level using share
> row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=2
> Number of deleted rows visited=0
> Number of pages visited=4
> Number of rows qualified=113
> Number of rows visited=114
> Scan type=btree
> Tree height=3
> start position:
> >= on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> stop position:
> > on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> qualifiers:
> None
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
>
> ---QUERY PLAN FOR SECOND QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 0
> Generate Time: 15
> Compile Time: 15
> Execute Time: 4250
> Begin Compilation Timestamp : 2005-11-11 13:16:17.578
> End Compilation Timestamp : 2005-11-11 13:16:17.593
> Begin Execution Timestamp : 2005-11-11 13:16:17.593
> End Execution Timestamp : 2005-11-11 13:16:27.437
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 100
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 712
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=712
> Number of rows output=593
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 712
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 40094
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 124
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Table Scan ResultSet for TBLSEARCHDICTIONARY at read
> committed
> isolation level using share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4001
> close time (milliseconds) = 15
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched={0, 2, 3}
> Number of columns fetched=3
> Number of pages visited=2978
> Number of rows qualified=40806
> Number of rows visited=256001
> Scan type=heap
> start position:
> null stop position:
> null qualifiers:
> Column[0][0] Id: 2
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> Column[0][1] Id: 2
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
>
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> ----------
>
> Thanks in advance for any help!
>
> Kind regards,
>
>
> Kevin Hore
>
>
>
Re: Poor query optimizer choices is making Derby unusable for large
tables
Posted by Kevin Hore <kh...@araxis.com>.
Hi Rick,
Thanks for your suggested re-write, but I'm really looking for a single
query. Unfortunately, we have quite a number of queries affected by this
and it would be significant work to re-written them all in this way. In
addition, some are considerably more complex than the example I gave,
and not so amenable to a relatively straightforward rewrite along the
lines you suggest.
Thanks again though, I appreciate the suggestion.
Kind regards,
Kevin Hore
Rick Hillegas wrote:
> Hi Kevin,
>
> You might also try using a temporary table to split your scan up into a
> series of optimzable queries. Regards-Rick
>
> declare global temporary table session.accumulator
> (
> ObjectId int NOT NULL,
> WordLocation int NOT NULL
> )
> not logged;
>
> insert into session.accumulator
> SELECT ObjectId, WordLocation
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> ;
> insert into session.accumulator
> SELECT ObjectId, WordLocation
> FROM tblSearchDictionary
> WHERE Word = 'ADD'
> ;
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM session.accumulator
> GROUP BY ObjectId;
>
>
> Mamta Satoor wrote:
>
>> Hi Kevin,
>>
>> I haven't investigated Derby-47 to know how to fix the problem but I
>> do have an optimizer overrides patch waiting for review on the derby
>> developer list which will let user specify their own optimizer
>> properties to help the optimizer pick a specific plan. The JIRA entry
>> for optimizer overrides is Derby-573. The patch is for Derby 10.2.
>>
>> These optimizer properties can be handy in a case like yours where
>> until the optimizer is fixed, a user can tell the optimizer what plan
>> to use. Just an FYI as I know it doesn't help with your particular
>> case since there is no optimizer overrides support for 10.1.
>>
>> Mamta
>>
>> On 11/11/05, *Kevin Hore* <kh@araxis.com <ma...@araxis.com>> wrote:
>>
>> The Derby query optimizer (this is with 10.1.1.0
>> <http://10.1.1.0>) decides to perform an
>> expensive table scan in certain circumstances, when it could make
>> use of
>> available indexes. The resulting poor performance is rendering Derby
>> useless for our application.
>>
>> I believe that this behaviour might be related to DERBY-47
>> ( http://issues.apache.org/jira/browse/DERBY-47), which was opened in
>> October 2004. However, this seems such a severe and fundamental
>> problem
>> that I'm surprised firstly that a significant portion of the Derby
>> user
>> base isn't affected by this and, secondly, that DERBY-47 hasn't been
>> considered a critical defect to be fixed with the utmost urgency.
>>
>> I've described the problem in detail below, and I'd appreciate any
>> assistance. Specifically:
>>
>> i) Does anyone have any plans to fix this problem?
>>
>> ii) In the meantime, are there any work-arounds? I'd appreciate any
>> suggestions that would decrease the execution time of my second query
>> below (the one with with two search terms). Likewise, any general
>> strategies for avoiding this problem with IN clauses would be
>> appreciated.
>>
>>
>> ----PROBLEM DESCRIPTION----
>>
>> Consider the table:
>>
>> CREATE TABLE tblSearchDictionary
>> (
>> ObjectId int NOT NULL,
>> ObjectType int NOT NULL,
>> Word VARCHAR(64) NOT NULL,
>> WordLocation int NOT NULL,
>> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>> );
>>
>> This table has an index on each of the four columns, it also has the
>> unique index across all four columns as defined above:
>>
>> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
>> (ObjectId);
>> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>> (ObjectType);
>> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>> (WordLocation);
>>
>> The table contains about 260,000 rows.
>>
>> The following query selects all rows that match instances of
>> string in
>> the Word column. It sums the WordLocation column having grouped by
>> the
>> ObjectId column.
>>
>> SELECT ObjectId, SUM(WordLocation) AS Score
>> FROM tblSearchDictionary
>> WHERE Word = 'CONTACT'
>> GROUP BY ObjectId;
>>
>> On my machine this will usually complete in an acceptable time of
>> around
>> 200ms.
>>
>> Now consider the following query which adds a second search term
>> on the
>> same column.
>>
>> SELECT ObjectId, SUM(WordLocation) AS Score
>> FROM tblSearchDictionary
>> WHERE Word = 'CONTACT' OR Word = 'ADD'
>> GROUP BY ObjectId;
>>
>> This second query usually takes around 10000ms on my machine. My
>> understanding from the Derby optimizer docs and DERBY-47 is that
>> this is
>> because Derby is re-writing the query along the following lines, and
>> then choosing to do a table scan:
>>
>> SELECT ObjectId, SUM(WordLocation) AS Score
>> FROM tblSearchDictionary
>> WHERE
>> Word IN ('CONTACT', 'ADD')
>> AND Word >= 'ADD'
>> AND Word <= 'CONTACT'
>> GROUP BY ObjectId;
>>
>> The plan for the first query indicates that the
>> tblSearchDictionaryWord
>> index is used to perform an index scan. However, the plan for the
>> second
>> query indicates that the majority of the additional time is taken
>> performing a table scan over the entire table, instead of making
>> use of
>> the indexes available. Our application uses IN quite frequently,
>> so this
>> optimizer behaviour would seem to present a significant problem.
>>
>> ---QUERY PLAN FOR FIRST QUERY----
>>
>> Statement Name:
>> null
>> Statement Text:
>> SELECT
>> ObjectId,
>> SUM(WordLocation) AS Score
>> FROM tblSearchDictionary
>> WHERE
>> Word = 'CONTACT'
>> GROUP BY
>> ObjectId
>>
>> Parse Time: 0
>> Bind Time: 0
>> Optimize Time: 16
>> Generate Time: 0
>> Compile Time: 16
>> Execute Time: 0
>> Begin Compilation Timestamp : 2005-11-11 12:28:52.765
>> End Compilation Timestamp : 2005-11-11 12:28: 52.781
>> Begin Execution Timestamp : 2005-11-11 13:08:15.406
>> End Execution Timestamp : 2005-11-11 13:08:15.406
>> Statement Execution Plan Text:
>> Project-Restrict ResultSet (5):
>> Number of opens = 1
>> Rows seen = 93
>> Rows filtered = 0
>> restriction = false
>> projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 226.00
>>
>> Source result set:
>> Grouped Aggregate ResultSet:
>> Number of opens = 1
>> Rows input = 113
>> Has distinct aggregate = false
>> In sorted order = false
>> Sort information:
>> Number of rows input=113
>> Number of rows output=93
>> Sort type=internal
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 226.00
>>
>> Source result set:
>> Project-Restrict ResultSet (4):
>> Number of opens = 1
>> Rows seen = 113
>> Rows filtered = 0
>> restriction = false
>> projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 118.00
>> optimizer estimated cost: 226.00
>>
>> Source result set:
>> Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>> Number of opens = 1
>> Rows seen = 113
>> Columns accessed from heap = {0, 3}
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> optimizer estimated row count: 118.00
>> optimizer estimated cost: 226.00
>>
>> Index Scan ResultSet for TBLSEARCHDICTIONARY using
>> index
>> TBLSEARCHDICTIONARYWORD at read committed isolation level using share
>> row locking chosen by the optimizer
>> Number of opens = 1
>> Rows seen = 113
>> Rows filtered = 0
>> Fetch Size = 1
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> next time in milliseconds/row = 0
>>
>> scan information:
>> Bit set of columns fetched=All
>> Number of columns fetched=2
>> Number of deleted rows visited=0
>> Number of pages visited=4
>> Number of rows qualified=113
>> Number of rows visited=114
>> Scan type=btree
>> Tree height=3
>> start position:
>> >= on first 1 column(s).
>> Ordered null semantics on the following columns:
>> 0
>> stop position:
>> > on first 1 column(s).
>> Ordered null semantics on the following columns:
>> 0
>> qualifiers:
>> None
>> optimizer estimated row count: 118.00
>> optimizer estimated cost: 226.00
>>
>>
>> ---QUERY PLAN FOR SECOND QUERY----
>>
>> Statement Name:
>> null
>> Statement Text:
>> SELECT
>> ObjectId,
>> SUM(WordLocation) AS Score
>> FROM tblSearchDictionary
>> WHERE
>> Word = 'CONTACT' OR Word = 'ADD'
>> GROUP BY
>> ObjectId
>>
>> Parse Time: 0
>> Bind Time: 0
>> Optimize Time: 0
>> Generate Time: 15
>> Compile Time: 15
>> Execute Time: 4250
>> Begin Compilation Timestamp : 2005-11-11 13:16:17.578
>> End Compilation Timestamp : 2005-11-11 13:16: 17.593
>> Begin Execution Timestamp : 2005-11-11 13:16:17.593
>> End Execution Timestamp : 2005-11-11 13:16:27.437
>> Statement Execution Plan Text:
>> Project-Restrict ResultSet (5):
>> Number of opens = 1
>> Rows seen = 100
>> Rows filtered = 0
>> restriction = false
>> projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 4250
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 82959.49
>>
>> Source result set:
>> Grouped Aggregate ResultSet:
>> Number of opens = 1
>> Rows input = 712
>> Has distinct aggregate = false
>> In sorted order = false
>> Sort information:
>> Number of rows input=712
>> Number of rows output=593
>> Sort type=internal
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 4250
>> next time (milliseconds) = 0
>> close time (milliseconds) = 0
>> optimizer estimated row count: 1.00
>> optimizer estimated cost: 82959.49
>>
>> Source result set:
>> Project-Restrict ResultSet (4):
>> Number of opens = 1
>> Rows seen = 712
>> Rows filtered = 0
>> restriction = false
>> projection = true
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 4219
>> close time (milliseconds) = 15
>> restriction time (milliseconds) = 0
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 19200.45
>> optimizer estimated cost: 82959.49
>>
>> Source result set:
>> Project-Restrict ResultSet (3):
>> Number of opens = 1
>> Rows seen = 40806
>> Rows filtered = 40094
>> restriction = true
>> projection = false
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 4219
>> close time (milliseconds) = 15
>> restriction time (milliseconds) = 124
>> projection time (milliseconds) = 0
>> optimizer estimated row count: 19200.45
>> optimizer estimated cost: 82959.49
>>
>> Source result set:
>> Table Scan ResultSet for TBLSEARCHDICTIONARY at read
>> committed
>> isolation level using share row locking chosen by the optimizer
>> Number of opens = 1
>> Rows seen = 40806
>> Rows filtered = 0
>> Fetch Size = 1
>> constructor time (milliseconds) = 0
>> open time (milliseconds) = 0
>> next time (milliseconds) = 4001
>> close time (milliseconds) = 15
>> next time in milliseconds/row = 0
>>
>> scan information:
>> Bit set of columns fetched={0, 2, 3}
>> Number of columns fetched=3
>> Number of pages visited=2978
>> Number of rows qualified=40806
>> Number of rows visited=256001
>> Scan type=heap
>> start position:
>> null stop position:
>> null qualifiers:
>> Column[0][0] Id: 2
>> Operator: <
>> Ordered nulls: false
>> Unknown return value: true
>> Negate comparison result: true
>> Column[0][1] Id: 2
>> Operator: <=
>> Ordered nulls: false
>> Unknown return value: false
>> Negate comparison result: false
>>
>> optimizer estimated row count: 19200.45
>> optimizer estimated cost: 82959.49
>>
>> ----------
>>
>> Thanks in advance for any help!
>>
>> Kind regards,
>>
>>
>> Kevin Hore
>>
>>
>
Re: Poor query optimizer choices is making Derby unusable for large
tables
Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Kevin,
You might also try using a temporary table to split your scan up into a
series of optimzable queries. Regards-Rick
declare global temporary table session.accumulator
(
ObjectId int NOT NULL,
WordLocation int NOT NULL
)
not logged;
insert into session.accumulator
SELECT ObjectId, WordLocation
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
;
insert into session.accumulator
SELECT ObjectId, WordLocation
FROM tblSearchDictionary
WHERE Word = 'ADD'
;
SELECT ObjectId, SUM(WordLocation) AS Score
FROM session.accumulator
GROUP BY ObjectId;
Mamta Satoor wrote:
> Hi Kevin,
>
> I haven't investigated Derby-47 to know how to fix the problem but I
> do have an optimizer overrides patch waiting for review on the derby
> developer list which will let user specify their own optimizer
> properties to help the optimizer pick a specific plan. The JIRA entry
> for optimizer overrides is Derby-573. The patch is for Derby 10.2.
>
> These optimizer properties can be handy in a case like yours where
> until the optimizer is fixed, a user can tell the optimizer what plan
> to use. Just an FYI as I know it doesn't help with your particular
> case since there is no optimizer overrides support for 10.1.
>
> Mamta
>
> On 11/11/05, *Kevin Hore* <kh@araxis.com <ma...@araxis.com>> wrote:
>
> The Derby query optimizer (this is with 10.1.1.0
> <http://10.1.1.0>) decides to perform an
> expensive table scan in certain circumstances, when it could make
> use of
> available indexes. The resulting poor performance is rendering Derby
> useless for our application.
>
> I believe that this behaviour might be related to DERBY-47
> ( http://issues.apache.org/jira/browse/DERBY-47), which was opened in
> October 2004. However, this seems such a severe and fundamental
> problem
> that I'm surprised firstly that a significant portion of the Derby
> user
> base isn't affected by this and, secondly, that DERBY-47 hasn't been
> considered a critical defect to be fixed with the utmost urgency.
>
> I've described the problem in detail below, and I'd appreciate any
> assistance. Specifically:
>
> i) Does anyone have any plans to fix this problem?
>
> ii) In the meantime, are there any work-arounds? I'd appreciate any
> suggestions that would decrease the execution time of my second query
> below (the one with with two search terms). Likewise, any general
> strategies for avoiding this problem with IN clauses would be
> appreciated.
>
>
> ----PROBLEM DESCRIPTION----
>
> Consider the table:
>
> CREATE TABLE tblSearchDictionary
> (
> ObjectId int NOT NULL,
> ObjectType int NOT NULL,
> Word VARCHAR(64) NOT NULL,
> WordLocation int NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
> );
>
> This table has an index on each of the four columns, it also has the
> unique index across all four columns as defined above:
>
> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
> (ObjectId);
> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
> (ObjectType);
> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
> (WordLocation);
>
> The table contains about 260,000 rows.
>
> The following query selects all rows that match instances of string in
> the Word column. It sums the WordLocation column having grouped by
> the
> ObjectId column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> GROUP BY ObjectId;
>
> On my machine this will usually complete in an acceptable time of
> around
> 200ms.
>
> Now consider the following query which adds a second search term
> on the
> same column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;
>
> This second query usually takes around 10000ms on my machine. My
> understanding from the Derby optimizer docs and DERBY-47 is that
> this is
> because Derby is re-writing the query along the following lines, and
> then choosing to do a table scan:
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word IN ('CONTACT', 'ADD')
> AND Word >= 'ADD'
> AND Word <= 'CONTACT'
> GROUP BY ObjectId;
>
> The plan for the first query indicates that the
> tblSearchDictionaryWord
> index is used to perform an index scan. However, the plan for the
> second
> query indicates that the majority of the additional time is taken
> performing a table scan over the entire table, instead of making
> use of
> the indexes available. Our application uses IN quite frequently,
> so this
> optimizer behaviour would seem to present a significant problem.
>
> ---QUERY PLAN FOR FIRST QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 16
> Generate Time: 0
> Compile Time: 16
> Execute Time: 0
> Begin Compilation Timestamp : 2005-11-11 12:28:52.765
> End Compilation Timestamp : 2005-11-11 12:28: 52.781
> Begin Execution Timestamp : 2005-11-11 13:08:15.406
> End Execution Timestamp : 2005-11-11 13:08:15.406
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 93
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 113
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=113
> Number of rows output=93
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
> Number of opens = 1
> Rows seen = 113
> Columns accessed from heap = {0, 3}
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Index Scan ResultSet for TBLSEARCHDICTIONARY using
> index
> TBLSEARCHDICTIONARYWORD at read committed isolation level using share
> row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=2
> Number of deleted rows visited=0
> Number of pages visited=4
> Number of rows qualified=113
> Number of rows visited=114
> Scan type=btree
> Tree height=3
> start position:
> >= on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> stop position:
> > on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> qualifiers:
> None
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
>
> ---QUERY PLAN FOR SECOND QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 0
> Generate Time: 15
> Compile Time: 15
> Execute Time: 4250
> Begin Compilation Timestamp : 2005-11-11 13:16:17.578
> End Compilation Timestamp : 2005-11-11 13:16: 17.593
> Begin Execution Timestamp : 2005-11-11 13:16:17.593
> End Execution Timestamp : 2005-11-11 13:16:27.437
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 100
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 712
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=712
> Number of rows output=593
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 712
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 40094
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 124
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Table Scan ResultSet for TBLSEARCHDICTIONARY at read
> committed
> isolation level using share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4001
> close time (milliseconds) = 15
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched={0, 2, 3}
> Number of columns fetched=3
> Number of pages visited=2978
> Number of rows qualified=40806
> Number of rows visited=256001
> Scan type=heap
> start position:
> null stop position:
> null qualifiers:
> Column[0][0] Id: 2
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> Column[0][1] Id: 2
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
>
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> ----------
>
> Thanks in advance for any help!
>
> Kind regards,
>
>
> Kevin Hore
>
>
Re: Poor query optimizer choices is making Derby unusable for large
tables
Posted by Kevin Hore <kh...@araxis.com>.
Hi Mamta,
Thank you for your suggestion, but this is for a production system and
so, as you suspected, we don't really want to use an alpha version of Derby.
Kind regards,
Kevin Hore
Mamta Satoor wrote:
> Hi Kevin,
> I haven't investigated Derby-47 to know how to fix the problem but I do
> have an optimizer overrides patch waiting for review on the derby developer
> list which will let user specify their own optimizer properties to help the
> optimizer pick a specific plan. The JIRA entry for optimizer overrides is
> Derby-573. The patch is for Derby 10.2.
> These optimizer properties can be handy in a case like yours where until
> the optimizer is fixed, a user can tell the optimizer what plan to use. Just
> an FYI as I know it doesn't help with your particular case since there is no
> optimizer overrides support for 10.1.
> Mamta
> On 11/11/05, Kevin Hore <kh...@araxis.com> wrote:
>
>>The Derby query optimizer (this is with 10.1.1.0 <http://10.1.1.0>)
>>decides to perform an
>>expensive table scan in certain circumstances, when it could make use of
>>available indexes. The resulting poor performance is rendering Derby
>>useless for our application.
>>
>>I believe that this behaviour might be related to DERBY-47
>>(http://issues.apache.org/jira/browse/DERBY-47), which was opened in
>>October 2004. However, this seems such a severe and fundamental problem
>>that I'm surprised firstly that a significant portion of the Derby user
>>base isn't affected by this and, secondly, that DERBY-47 hasn't been
>>considered a critical defect to be fixed with the utmost urgency.
>>
>>I've described the problem in detail below, and I'd appreciate any
>>assistance. Specifically:
>>
>>i) Does anyone have any plans to fix this problem?
>>
>>ii) In the meantime, are there any work-arounds? I'd appreciate any
>>suggestions that would decrease the execution time of my second query
>>below (the one with with two search terms). Likewise, any general
>>strategies for avoiding this problem with IN clauses would be appreciated.
>>
>>
>>----PROBLEM DESCRIPTION----
>>
>>Consider the table:
>>
>>CREATE TABLE tblSearchDictionary
>>(
>>ObjectId int NOT NULL,
>>ObjectType int NOT NULL,
>>Word VARCHAR(64) NOT NULL,
>>WordLocation int NOT NULL,
>>CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>>);
>>
>>This table has an index on each of the four columns, it also has the
>>unique index across all four columns as defined above:
>>
>>CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
>>(ObjectId);
>>CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>>(ObjectType);
>>CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>>CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>>(WordLocation);
>>
>>The table contains about 260,000 rows.
>>
>>The following query selects all rows that match instances of string in
>>the Word column. It sums the WordLocation column having grouped by the
>>ObjectId column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT'
>>GROUP BY ObjectId;
>>
>>On my machine this will usually complete in an acceptable time of around
>>200ms.
>>
>>Now consider the following query which adds a second search term on the
>>same column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY ObjectId;
>>
>>This second query usually takes around 10000ms on my machine. My
>>understanding from the Derby optimizer docs and DERBY-47 is that this is
>>because Derby is re-writing the query along the following lines, and
>>then choosing to do a table scan:
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>Word IN ('CONTACT', 'ADD')
>>AND Word >= 'ADD'
>>AND Word <= 'CONTACT'
>>GROUP BY ObjectId;
>>
>>The plan for the first query indicates that the tblSearchDictionaryWord
>>index is used to perform an index scan. However, the plan for the second
>>query indicates that the majority of the additional time is taken
>>performing a table scan over the entire table, instead of making use of
>>the indexes available. Our application uses IN quite frequently, so this
>>optimizer behaviour would seem to present a significant problem.
>>
>>---QUERY PLAN FOR FIRST QUERY----
>>
>>Statement Name:
>>null
>>Statement Text:
>>SELECT
>>ObjectId,
>>SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>Word = 'CONTACT'
>>GROUP BY
>>ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 16
>>Generate Time: 0
>>Compile Time: 16
>>Execute Time: 0
>>Begin Compilation Timestamp : 2005-11-11 12:28:52.765
>>End Compilation Timestamp : 2005-11-11 12:28:52.781
>>Begin Execution Timestamp : 2005-11-11 13:08:15.406
>>End Execution Timestamp : 2005-11-11 13:08:15.406
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 93
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 226.00
>>
>>Source result set:
>>Grouped Aggregate ResultSet:
>>Number of opens = 1
>>Rows input = 113
>>Has distinct aggregate = false
>>In sorted order = false
>>Sort information:
>>Number of rows input=113
>>Number of rows output=93
>>Sort type=internal
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 226.00
>>
>>Source result set:
>>Project-Restrict ResultSet (4):
>>Number of opens = 1
>>Rows seen = 113
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 118.00
>>optimizer estimated cost: 226.00
>>
>>Source result set:
>>Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
>>Number of opens = 1
>>Rows seen = 113
>>Columns accessed from heap = {0, 3}
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>optimizer estimated row count: 118.00
>>optimizer estimated cost: 226.00
>>
>>Index Scan ResultSet for TBLSEARCHDICTIONARY using index
>>TBLSEARCHDICTIONARYWORD at read committed isolation level using share
>>row locking chosen by the optimizer
>>Number of opens = 1
>>Rows seen = 113
>>Rows filtered = 0
>>Fetch Size = 1
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>next time in milliseconds/row = 0
>>
>>scan information:
>>Bit set of columns fetched=All
>>Number of columns fetched=2
>>Number of deleted rows visited=0
>>Number of pages visited=4
>>Number of rows qualified=113
>>Number of rows visited=114
>>Scan type=btree
>>Tree height=3
>>start position:
>>
>>>= on first 1 column(s).
>>
>>Ordered null semantics on the following columns:
>>0
>>stop position:
>>
>>>on first 1 column(s).
>>
>>Ordered null semantics on the following columns:
>>0
>>qualifiers:
>>None
>>optimizer estimated row count: 118.00
>>optimizer estimated cost: 226.00
>>
>>
>>---QUERY PLAN FOR SECOND QUERY----
>>
>>Statement Name:
>>null
>>Statement Text:
>>SELECT
>>ObjectId,
>>SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE
>>Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY
>>ObjectId
>>
>>Parse Time: 0
>>Bind Time: 0
>>Optimize Time: 0
>>Generate Time: 15
>>Compile Time: 15
>>Execute Time: 4250
>>Begin Compilation Timestamp : 2005-11-11 13:16:17.578
>>End Compilation Timestamp : 2005-11-11 13:16:17.593
>>Begin Execution Timestamp : 2005-11-11 13:16:17.593
>>End Execution Timestamp : 2005-11-11 13:16:27.437
>>Statement Execution Plan Text:
>>Project-Restrict ResultSet (5):
>>Number of opens = 1
>>Rows seen = 100
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 4250
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Grouped Aggregate ResultSet:
>>Number of opens = 1
>>Rows input = 712
>>Has distinct aggregate = false
>>In sorted order = false
>>Sort information:
>>Number of rows input=712
>>Number of rows output=593
>>Sort type=internal
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 4250
>>next time (milliseconds) = 0
>>close time (milliseconds) = 0
>>optimizer estimated row count: 1.00
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Project-Restrict ResultSet (4):
>>Number of opens = 1
>>Rows seen = 712
>>Rows filtered = 0
>>restriction = false
>>projection = true
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 4219
>>close time (milliseconds) = 15
>>restriction time (milliseconds) = 0
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 19200.45
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Project-Restrict ResultSet (3):
>>Number of opens = 1
>>Rows seen = 40806
>>Rows filtered = 40094
>>restriction = true
>>projection = false
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 4219
>>close time (milliseconds) = 15
>>restriction time (milliseconds) = 124
>>projection time (milliseconds) = 0
>>optimizer estimated row count: 19200.45
>>optimizer estimated cost: 82959.49
>>
>>Source result set:
>>Table Scan ResultSet for TBLSEARCHDICTIONARY at read
>>committed
>>isolation level using share row locking chosen by the optimizer
>>Number of opens = 1
>>Rows seen = 40806
>>Rows filtered = 0
>>Fetch Size = 1
>>constructor time (milliseconds) = 0
>>open time (milliseconds) = 0
>>next time (milliseconds) = 4001
>>close time (milliseconds) = 15
>>next time in milliseconds/row = 0
>>
>>scan information:
>>Bit set of columns fetched={0, 2, 3}
>>Number of columns fetched=3
>>Number of pages visited=2978
>>Number of rows qualified=40806
>>Number of rows visited=256001
>>Scan type=heap
>>start position:
>>null stop position:
>>null qualifiers:
>>Column[0][0] Id: 2
>>Operator: <
>>Ordered nulls: false
>>Unknown return value: true
>>Negate comparison result: true
>>Column[0][1] Id: 2
>>Operator: <=
>>Ordered nulls: false
>>Unknown return value: false
>>Negate comparison result: false
>>
>>optimizer estimated row count: 19200.45
>>optimizer estimated cost: 82959.49
>>
>>----------
>>
>>Thanks in advance for any help!
>>
>>Kind regards,
>>
>>
>>Kevin Hore
>>
>>
>
>
Re: Poor query optimizer choices is making Derby unusable for large tables
Posted by Mamta Satoor <ms...@gmail.com>.
Hi Kevin,
I haven't investigated Derby-47 to know how to fix the problem but I do
have an optimizer overrides patch waiting for review on the derby developer
list which will let user specify their own optimizer properties to help the
optimizer pick a specific plan. The JIRA entry for optimizer overrides is
Derby-573. The patch is for Derby 10.2.
These optimizer properties can be handy in a case like yours where until
the optimizer is fixed, a user can tell the optimizer what plan to use. Just
an FYI as I know it doesn't help with your particular case since there is no
optimizer overrides support for 10.1.
Mamta
On 11/11/05, Kevin Hore <kh...@araxis.com> wrote:
>
> The Derby query optimizer (this is with 10.1.1.0 <http://10.1.1.0>)
> decides to perform an
> expensive table scan in certain circumstances, when it could make use of
> available indexes. The resulting poor performance is rendering Derby
> useless for our application.
>
> I believe that this behaviour might be related to DERBY-47
> (http://issues.apache.org/jira/browse/DERBY-47), which was opened in
> October 2004. However, this seems such a severe and fundamental problem
> that I'm surprised firstly that a significant portion of the Derby user
> base isn't affected by this and, secondly, that DERBY-47 hasn't been
> considered a critical defect to be fixed with the utmost urgency.
>
> I've described the problem in detail below, and I'd appreciate any
> assistance. Specifically:
>
> i) Does anyone have any plans to fix this problem?
>
> ii) In the meantime, are there any work-arounds? I'd appreciate any
> suggestions that would decrease the execution time of my second query
> below (the one with with two search terms). Likewise, any general
> strategies for avoiding this problem with IN clauses would be appreciated.
>
>
> ----PROBLEM DESCRIPTION----
>
> Consider the table:
>
> CREATE TABLE tblSearchDictionary
> (
> ObjectId int NOT NULL,
> ObjectType int NOT NULL,
> Word VARCHAR(64) NOT NULL,
> WordLocation int NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
> );
>
> This table has an index on each of the four columns, it also has the
> unique index across all four columns as defined above:
>
> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
> (ObjectId);
> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
> (ObjectType);
> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
> (WordLocation);
>
> The table contains about 260,000 rows.
>
> The following query selects all rows that match instances of string in
> the Word column. It sums the WordLocation column having grouped by the
> ObjectId column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> GROUP BY ObjectId;
>
> On my machine this will usually complete in an acceptable time of around
> 200ms.
>
> Now consider the following query which adds a second search term on the
> same column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;
>
> This second query usually takes around 10000ms on my machine. My
> understanding from the Derby optimizer docs and DERBY-47 is that this is
> because Derby is re-writing the query along the following lines, and
> then choosing to do a table scan:
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word IN ('CONTACT', 'ADD')
> AND Word >= 'ADD'
> AND Word <= 'CONTACT'
> GROUP BY ObjectId;
>
> The plan for the first query indicates that the tblSearchDictionaryWord
> index is used to perform an index scan. However, the plan for the second
> query indicates that the majority of the additional time is taken
> performing a table scan over the entire table, instead of making use of
> the indexes available. Our application uses IN quite frequently, so this
> optimizer behaviour would seem to present a significant problem.
>
> ---QUERY PLAN FOR FIRST QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 16
> Generate Time: 0
> Compile Time: 16
> Execute Time: 0
> Begin Compilation Timestamp : 2005-11-11 12:28:52.765
> End Compilation Timestamp : 2005-11-11 12:28:52.781
> Begin Execution Timestamp : 2005-11-11 13:08:15.406
> End Execution Timestamp : 2005-11-11 13:08:15.406
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 93
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 113
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=113
> Number of rows output=93
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Source result set:
> Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
> Number of opens = 1
> Rows seen = 113
> Columns accessed from heap = {0, 3}
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
> Index Scan ResultSet for TBLSEARCHDICTIONARY using index
> TBLSEARCHDICTIONARYWORD at read committed isolation level using share
> row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 113
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=2
> Number of deleted rows visited=0
> Number of pages visited=4
> Number of rows qualified=113
> Number of rows visited=114
> Scan type=btree
> Tree height=3
> start position:
> >= on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> stop position:
> > on first 1 column(s).
> Ordered null semantics on the following columns:
> 0
> qualifiers:
> None
> optimizer estimated row count: 118.00
> optimizer estimated cost: 226.00
>
>
> ---QUERY PLAN FOR SECOND QUERY----
>
> Statement Name:
> null
> Statement Text:
> SELECT
> ObjectId,
> SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE
> Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY
> ObjectId
>
> Parse Time: 0
> Bind Time: 0
> Optimize Time: 0
> Generate Time: 15
> Compile Time: 15
> Execute Time: 4250
> Begin Compilation Timestamp : 2005-11-11 13:16:17.578
> End Compilation Timestamp : 2005-11-11 13:16:17.593
> Begin Execution Timestamp : 2005-11-11 13:16:17.593
> End Execution Timestamp : 2005-11-11 13:16:27.437
> Statement Execution Plan Text:
> Project-Restrict ResultSet (5):
> Number of opens = 1
> Rows seen = 100
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Grouped Aggregate ResultSet:
> Number of opens = 1
> Rows input = 712
> Has distinct aggregate = false
> In sorted order = false
> Sort information:
> Number of rows input=712
> Number of rows output=593
> Sort type=internal
> constructor time (milliseconds) = 0
> open time (milliseconds) = 4250
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (4):
> Number of opens = 1
> Rows seen = 712
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 40094
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4219
> close time (milliseconds) = 15
> restriction time (milliseconds) = 124
> projection time (milliseconds) = 0
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> Source result set:
> Table Scan ResultSet for TBLSEARCHDICTIONARY at read
> committed
> isolation level using share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 40806
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 4001
> close time (milliseconds) = 15
> next time in milliseconds/row = 0
>
> scan information:
> Bit set of columns fetched={0, 2, 3}
> Number of columns fetched=3
> Number of pages visited=2978
> Number of rows qualified=40806
> Number of rows visited=256001
> Scan type=heap
> start position:
> null stop position:
> null qualifiers:
> Column[0][0] Id: 2
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> Column[0][1] Id: 2
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
>
> optimizer estimated row count: 19200.45
> optimizer estimated cost: 82959.49
>
> ----------
>
> Thanks in advance for any help!
>
> Kind regards,
>
>
> Kevin Hore
>
>
SV: SV: Is Hibernate inappropriate for embedded databases?
Posted by Henrik Johansson <he...@vocab.se>.
I am not sure how the Store layer cache is used in standard JDBC
usage but it seems like using Hibernate would not consume more memory
than executing an SQL statement and from the resultset create objects
that you would use in your application.
On the contrary, having caching in the DAO layer will probably make it
more efficient since you ask the cache rather than the DB for data.
I am not at all an expert in this field but the simplicity that comes with
using Hibernate should not be discarded without careful consideration.
Maybe we have some gurus out there that have more experience with the
Derby/Hibernate combination...?
/ Henrik
-----Ursprungligt meddelande-----
Från: Michael McCutcheon [mailto:michaelm001@frontiernet.net]
Skickat: den 18 november 2005 15:47
Till: Derby Discussion
Ämne: Re: SV: Is Hibernate inappropriate for embedded databases?
But what about the 'double caching' problems for an embedded scenario?
Henrik Johansson wrote:
>I tried JBoss Embedded which uses Hibernate for a small test
>and it worked like a charm with Derby.
>
>Using Hibernate alone (or with JBoss Embedded) seems to be a
>very good idea in my opinion for all sorts of reasons.
>
>Too bad its seemingly not production ready...
>
>Hibernate alone would be clever since it is very well tested and
>there is no idea to reinvent the wheel.
>
>/ Henrik
>
>
>-----Ursprungligt meddelande-----
>Från: Michael McCutcheon [mailto:michaelm001@frontiernet.net]
>Skickat: den 18 november 2005 05:33
>Till: Derby Discussion
>Ämne: Is Hibernate inappropriate for embedded databases?
>
>
>I'm still struggling to figure out if I should go with strait JDBC,
>stored procedures or something like Hibernate for my data access/update
>in my web app running on Tomcat.
>
>I've been looking at Hibernate, and it seems that it's claim to fame is
>all of the fancy caching it can do to speed performance.
>
>However, doesn't derby itself cache data in memory (once the data is
>accessed)?
>
>If the caching functionality in Hibernate is used for an embedded
>database, doesn't that just mean that the data will be cached twice
>(once in derby, once in hibernate), resulting in memory bloat and
>potentially worse performance?
>
>Any thoughts on this issue would be appreciated.
>
>Mike
>
>
>
>
>
>
Re: SV: Is Hibernate inappropriate for embedded databases?
Posted by Michael McCutcheon <mi...@frontiernet.net>.
But what about the 'double caching' problems for an embedded scenario?
Henrik Johansson wrote:
>I tried JBoss Embedded which uses Hibernate for a small test
>and it worked like a charm with Derby.
>
>Using Hibernate alone (or with JBoss Embedded) seems to be a
>very good idea in my opinion for all sorts of reasons.
>
>Too bad its seemingly not production ready...
>
>Hibernate alone would be clever since it is very well tested and
>there is no idea to reinvent the wheel.
>
>/ Henrik
>
>
>-----Ursprungligt meddelande-----
>Från: Michael McCutcheon [mailto:michaelm001@frontiernet.net]
>Skickat: den 18 november 2005 05:33
>Till: Derby Discussion
>Ämne: Is Hibernate inappropriate for embedded databases?
>
>
>I'm still struggling to figure out if I should go with strait JDBC,
>stored procedures or something like Hibernate for my data access/update
>in my web app running on Tomcat.
>
>I've been looking at Hibernate, and it seems that it's claim to fame is
>all of the fancy caching it can do to speed performance.
>
>However, doesn't derby itself cache data in memory (once the data is
>accessed)?
>
>If the caching functionality in Hibernate is used for an embedded
>database, doesn't that just mean that the data will be cached twice
>(once in derby, once in hibernate), resulting in memory bloat and
>potentially worse performance?
>
>Any thoughts on this issue would be appreciated.
>
>Mike
>
>
>
>
>
>
SV: Is Hibernate inappropriate for embedded databases?
Posted by Henrik Johansson <he...@vocab.se>.
I tried JBoss Embedded which uses Hibernate for a small test
and it worked like a charm with Derby.
Using Hibernate alone (or with JBoss Embedded) seems to be a
very good idea in my opinion for all sorts of reasons.
Too bad its seemingly not production ready...
Hibernate alone would be clever since it is very well tested and
there is no idea to reinvent the wheel.
/ Henrik
-----Ursprungligt meddelande-----
Från: Michael McCutcheon [mailto:michaelm001@frontiernet.net]
Skickat: den 18 november 2005 05:33
Till: Derby Discussion
Ämne: Is Hibernate inappropriate for embedded databases?
I'm still struggling to figure out if I should go with strait JDBC,
stored procedures or something like Hibernate for my data access/update
in my web app running on Tomcat.
I've been looking at Hibernate, and it seems that it's claim to fame is
all of the fancy caching it can do to speed performance.
However, doesn't derby itself cache data in memory (once the data is
accessed)?
If the caching functionality in Hibernate is used for an embedded
database, doesn't that just mean that the data will be cached twice
(once in derby, once in hibernate), resulting in memory bloat and
potentially worse performance?
Any thoughts on this issue would be appreciated.
Mike
Is Hibernate inappropriate for embedded databases?
Posted by Michael McCutcheon <mi...@frontiernet.net>.
I'm still struggling to figure out if I should go with strait JDBC,
stored procedures or something like Hibernate for my data access/update
in my web app running on Tomcat.
I've been looking at Hibernate, and it seems that it's claim to fame is
all of the fancy caching it can do to speed performance.
However, doesn't derby itself cache data in memory (once the data is
accessed)?
If the caching functionality in Hibernate is used for an embedded
database, doesn't that just mean that the data will be cached twice
(once in derby, once in hibernate), resulting in memory bloat and
potentially worse performance?
Any thoughts on this issue would be appreciated.
Mike
Re: Poor query optimizer choices is making Derby unusable for large
tables
Posted by Kevin Hore <kh...@araxis.com>.
Hi Michael,
Thank you for yours suggestions. You've obviously very kindly given it
quite a bit a thought.
I'd already tried a bunch of the things that you have suggested.
Defining a composite key and removing the existing indexes did nothing
to improve performance. Re-writing using IN clause produced no
improvement (indeed, I'm pretty sure that this is what Derby is doing
anyway).
I ought to point out here that my comments on how the optimizer would
re-write my query is based on the Derby documentation found at
http://db.apache.org/derby/docs/10.1/tuning/rtuntransform582.html
Unfortunately (for me) this is merely one simple example of a whole
class of problems with different queries involving different tables in
the system I'm working on, and the problem is fundamentally that
multiple IN/WHERE terms aren't being handled sensibly, as summarized by
Satheesh Bandaram in his post. Derby is able to choose a sensible
strategy with just one term in the WHERE/IN clause -- it's just the case
of multiple terms where it chooses inappropriate strategies.
I will add my thoughts to DERBY-47 regarding this.
Thanks again,
Kevin Hore
Michael Segel wrote:
> On Friday 11 November 2005 08:59, Kevin Hore wrote:
> Hi,
>
> Before complaining about the query optimizer, lets look at your design first.
> Yeah, I know that Derby, like every other database is not perfect and could
> always use improvements. But the reality is that the first place to look for
> improvements is in your own code.
>
> Since you've got a small table and only 260K rows, lets try these simple
> experiments to see if it will help.
>
>
>>----PROBLEM DESCRIPTION----
>>
>>Consider the table:
>>
>>CREATE TABLE tblSearchDictionary
>>(
>>ObjectId int NOT NULL,
>>ObjectType int NOT NULL,
>>Word VARCHAR(64) NOT NULL,
>>WordLocation int NOT NULL,
>>CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
>>);
>>
>
> Just a first suggestion, and its really a nit, could you alter the table to
> have Word as the last column?
>
> CREATE TABLE tblSearchDictionary
> (
> objectId INT NOT NULL,
> objectType INT NOT NULL,
> wordLocation INT NOT NULL,
> word VARCHAR(64) NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (objectId,objectType,word,wordLocation)
> );
>
>>This table has an index on each of the four columns, it also has the
>>unique index across all four columns as defined above:
>>
>
> Ok first suggestion. Drop all of your Indexes.
> Based on your examples, these indexes are part of the problem.
>
> For example an index on objectType doesn't make any sense.
> First, in your example, you don't even use objectType in any of your queries.
> Second. How many objectTypes are there? 10,100, 1000? The point is that
> objectType is not an unique enough identifier to justify being its own index.
> The same could be said for wordLocation. So why have an index on this column?
>
>
>
>>CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
>>CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
>>(ObjectType);
>>CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
>>CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
>>(WordLocation);
>>
>
>
> Ok, since you believe that the Optimizer is having problems selecting the
> correct Index to use, lets create your new indexes one by one to see how they
> can improve performance.
>
> Lets review your query statements:
>
>
>>The following query selects all rows that match instances of string in
>>the Word column. It sums the WordLocation column having grouped by the
>>ObjectId column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT'
>>GROUP BY ObjectId;
>>
>>On my machine this will usually complete in an acceptable time of around
>>200ms.
>>
>
> Ok, this is pretty straight forward enough.
>
> A good index would be a composite index on objectID and word.
>
>
>>Now consider the following query which adds a second search term on the
>>same column.
>>
>>SELECT ObjectId, SUM(WordLocation) AS Score
>>FROM tblSearchDictionary
>>WHERE Word = 'CONTACT' OR Word = 'ADD'
>>GROUP BY ObjectId;
>>
>
>
> Ok, again if you create a composite index on objectID and Word, you will get
> better performance.
>
> As you point out below, the query may get rewritten. If you wanted to, you
> could just rewrite your query to say "WHERE word IN ('CONTACT', 'ADD') but
> I'd save that for an option to test as a further tweak.
>
>
> So, as a test, do the following:
>
> 1) DROP ALL CURRENT INDEXes.
> 2) CREATE a composite index on objectID and Word.
>
> Run your query tests to see what happens to performance.
>
> 3) To test your theory about the optimizer, rewrite your second query to use
> the IN (xxx,xxx) option rather than OR.
>
> In addition:
>
> 1) Recreate your table with your VARCHAR column as the last column.
> 2) Alter your table and replace your VARCHAR column with a fixed length
> CHAR(64) column.
>
> If you can, please post your results.
> Do each change and see what happens to your query times.
>
> HTH
>
> -Gumby
Re: Poor query optimizer choices is making Derby unusable for large tables
Posted by Michael Segel <ms...@segel.com>.
On Saturday 12 November 2005 16:04, Michael Segel wrote:
> On Saturday 12 November 2005 10:28, Michael Segel wrote:
> > CREATE TABLE tblSearchDictionary
> > (
> > objectId INT NOT NULL,
> > objectType INT NOT NULL,
> > wordLocation INT NOT NULL,
> > word VARCHAR(64) NOT NULL,
> > CONSTRAINT CONSd0e222 UNIQUE (objectId,objectType,word,wordLocation)
> > );
> >
> > > This table has an index on each of the four columns, it also has the
> > > unique index across all four columns as defined above:
>
> As a follow up...
> Change the constraint order to UNIQUE(objectId, word, wordLocation,
> objectType);
>
> You probably can drop the index I asked you to create if you make this mod.
> Then your query should use the unique constraint's "backing index"...
>
> HTH.
>
> -G
If you want to make this a little bit more efficient, you could try the
following:
1) Create two tables.
CREATE TABLE wordKey
(
wordID INT NOT NULL GENERATED ALWAYS AS IDENTITY,
word VARCHAR(128) NOT NULL,
CONSTRAINT wordKeyCONS01 UNIQUE (wordID, word)
);
// Note this should create the backing INDEX so you don't need to create an
INDEX
CREATE TABLE tblSearchDictionary
(
objectId INT NOT NULL,
objectType INT NOT NULL,
wordLocation INT NOT NULL,
wordID INT NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (objectId,wordID, wordLocation, objectType)
);
// Note: I didn't see the SQL Syntax for creating a foreign key on wordID but
you'll want to do that and to also for cascading deletes, even though you may
not ever delete a word from wordKey...
It looks like you're trying to build your own word search index of documents.
If you you'll want to automate it rather than use static queries.
So lets say you're getting a text input of "CONSTRAINT OR BAR AND FOO".
I would interpret this to mean select documents that either contain the word
"CONSTRAINT" or the words "FOO" AND "BAR".
Note that you're going to create a parse tree that says node one contains 1
item, "CONSTRAINT", node two contains 2 items, "FOO", "BAR"...
So you would want to create a SELECT statement that is equivlent to:
SELECT objectID, SUM(wordLocation) SCORE
FROM tblSearchDictionary
WHERE wordID = xxx
UNION (
SELECT objectID, SUM(wordLocation) SCORE
FROM tblSearchDictionary
WHERE wordID IN (yyy,zzz)
)...
GROUP BY objectID
Where xxx, yyy, zzz are integer values that you got from your look up.
While this isn't the complete solution, it should point you in the correct
direction and should work independent of some kludges in the optimizer.
Your wordKey look up statement would be a prepared statement, while your
actual scoring routine would be a dynamic statement.
This should give you what you want.
I believe that there may be more to this for example what happens if you have
a document that contains all three words ....
I'd also recommend the lookup table as a way to decrease the size of your
overall solution and it may change how your optimizer looks at things.
(You're doing your main query all on ints...)
But hey, what do I know? I am just having my first cup of joe and you get what
you paid for... ;-)
-G
--
Michael Segel
Principal
MSCC
Re: Poor query optimizer choices is making Derby unusable for large tables
Posted by Michael Segel <ms...@segel.com>.
On Saturday 12 November 2005 10:28, Michael Segel wrote:
> CREATE TABLE tblSearchDictionary
> (
> objectId INT NOT NULL,
> objectType INT NOT NULL,
> wordLocation INT NOT NULL,
> word VARCHAR(64) NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (objectId,objectType,word,wordLocation)
> );
>
> > This table has an index on each of the four columns, it also has the
> > unique index across all four columns as defined above:
>
As a follow up...
Change the constraint order to UNIQUE(objectId, word, wordLocation,
objectType);
You probably can drop the index I asked you to create if you make this mod.
Then your query should use the unique constraint's "backing index"...
HTH.
-G
--
Michael Segel
Principal
MSCC
Re: Poor query optimizer choices is making Derby unusable for large tables
Posted by Michael Segel <ms...@segel.com>.
On Friday 11 November 2005 08:59, Kevin Hore wrote:
Hi,
Before complaining about the query optimizer, lets look at your design first.
Yeah, I know that Derby, like every other database is not perfect and could
always use improvements. But the reality is that the first place to look for
improvements is in your own code.
Since you've got a small table and only 260K rows, lets try these simple
experiments to see if it will help.
> ----PROBLEM DESCRIPTION----
>
> Consider the table:
>
> CREATE TABLE tblSearchDictionary
> (
> ObjectId int NOT NULL,
> ObjectType int NOT NULL,
> Word VARCHAR(64) NOT NULL,
> WordLocation int NOT NULL,
> CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
> );
>
Just a first suggestion, and its really a nit, could you alter the table to
have Word as the last column?
CREATE TABLE tblSearchDictionary
(
objectId INT NOT NULL,
objectType INT NOT NULL,
wordLocation INT NOT NULL,
word VARCHAR(64) NOT NULL,
CONSTRAINT CONSd0e222 UNIQUE (objectId,objectType,word,wordLocation)
);
> This table has an index on each of the four columns, it also has the
> unique index across all four columns as defined above:
>
Ok first suggestion. Drop all of your Indexes.
Based on your examples, these indexes are part of the problem.
For example an index on objectType doesn't make any sense.
First, in your example, you don't even use objectType in any of your queries.
Second. How many objectTypes are there? 10,100, 1000? The point is that
objectType is not an unique enough identifier to justify being its own index.
The same could be said for wordLocation. So why have an index on this column?
> CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary (ObjectId);
> CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
> (ObjectType);
> CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
> CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
> (WordLocation);
>
Ok, since you believe that the Optimizer is having problems selecting the
correct Index to use, lets create your new indexes one by one to see how they
can improve performance.
Lets review your query statements:
> The following query selects all rows that match instances of string in
> the Word column. It sums the WordLocation column having grouped by the
> ObjectId column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> GROUP BY ObjectId;
>
> On my machine this will usually complete in an acceptable time of around
> 200ms.
>
Ok, this is pretty straight forward enough.
A good index would be a composite index on objectID and word.
> Now consider the following query which adds a second search term on the
> same column.
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;
>
Ok, again if you create a composite index on objectID and Word, you will get
better performance.
As you point out below, the query may get rewritten. If you wanted to, you
could just rewrite your query to say "WHERE word IN ('CONTACT', 'ADD') but
I'd save that for an option to test as a further tweak.
So, as a test, do the following:
1) DROP ALL CURRENT INDEXes.
2) CREATE a composite index on objectID and Word.
Run your query tests to see what happens to performance.
3) To test your theory about the optimizer, rewrite your second query to use
the IN (xxx,xxx) option rather than OR.
In addition:
1) Recreate your table with your VARCHAR column as the last column.
2) Alter your table and replace your VARCHAR column with a fixed length
CHAR(64) column.
If you can, please post your results.
Do each change and see what happens to your query times.
HTH
-Gumby
--
Michael Segel
Principal
MSCC
Re: Poor query optimizer choices is making Derby unusable for large
tables
Posted by Kevin Hore <kh...@araxis.com>.
Hi Jeffrey,
Thank you for your response. I think you are probably right about Derby
not having a strategy for doing multiple scans. I think DERBY-47
probably covers this, but I'll perhaps add a note to that covering my
circumstances.
I had thought of using UNION but, unfortunately the IN clause must cope
with a varying number of values, and I don't want the system to be
constructing complex statements on-the-fly to avoid this particular
Derby problem. We have quite a number of queries affected by this, not
just the (simple) example under consideration, so using UNION would
require major surgery to our application.
Thanks again.
Kind regards
Kevin Hore
Jeffrey Lichtman wrote:
>
>> I've described the problem in detail below, and I'd appreciate any
>> assistance. Specifically:
>>
>> i) Does anyone have any plans to fix this problem?
>
>
> I believe the real problem is that Derby doesn't have any strategy for
> doing multiple scans for OR/IN clauses. This is a useful feature, but I
> don't know if anyone has entered an enhancement request.
>
>> ii) In the meantime, are there any work-arounds? I'd appreciate any
>> suggestions that would decrease the execution time of my second query
>> below (the one with with two search terms). Likewise, any general
>> strategies for avoiding this problem with IN clauses would be
>> appreciated.
>
>
> You could re-write the query to use UNION:
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT' OR Word = 'ADD'
> GROUP BY ObjectId;
>
> would become:
>
> SELECT ObjectId, SUM(WordLocation) AS Score
> FROM
> (SELECT ObjectId, WordLocation
> FROM tblSearchDictionary
> WHERE Word = 'CONTACT'
> UNION ALL
> SELECT ObjectId, WordLocation
> FROM tblSearchDictionary
> WHERE Word = 'ADD') t
> GROUP BY ObjectId;
>
>
> - Jeff Lichtman
> swazoo@rcn.com
> Check out Swazoo Koolak's Web Jukebox at
> http://swazoo.com/
Re: Poor query optimizer choices is making Derby unusable for
large tables
Posted by Jeffrey Lichtman <sw...@rcn.com>.
>I've described the problem in detail below, and I'd appreciate any
>assistance. Specifically:
>
>i) Does anyone have any plans to fix this problem?
I believe the real problem is that Derby doesn't have any strategy
for doing multiple scans for OR/IN clauses. This is a useful feature,
but I don't know if anyone has entered an enhancement request.
>ii) In the meantime, are there any work-arounds? I'd appreciate any
>suggestions that would decrease the execution time of my second
>query below (the one with with two search terms). Likewise, any
>general strategies for avoiding this problem with IN clauses would
>be appreciated.
You could re-write the query to use UNION:
SELECT ObjectId, SUM(WordLocation) AS Score
FROM tblSearchDictionary
WHERE Word = 'CONTACT' OR Word = 'ADD'
GROUP BY ObjectId;
would become:
SELECT ObjectId, SUM(WordLocation) AS Score
FROM
(SELECT ObjectId, WordLocation
FROM tblSearchDictionary
WHERE Word = 'CONTACT'
UNION ALL
SELECT ObjectId, WordLocation
FROM tblSearchDictionary
WHERE Word = 'ADD') t
GROUP BY ObjectId;
- Jeff Lichtman
swazoo@rcn.com
Check out Swazoo Koolak's Web Jukebox at
http://swazoo.com/