You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Alex Vasiliev (JIRA)" <ji...@apache.org> on 2017/10/05 03:40:00 UTC
[jira] [Created] (DERBY-6967) Query Optimizer picks Table Scan based on WHERE predicate, ignores expensive ORDER BY.

Alex Vasiliev created DERBY-6967:
------------------------------------

             Summary: Query Optimizer picks Table Scan based on WHERE predicate, ignores expensive ORDER BY. 
                 Key: DERBY-6967
                 URL: https://issues.apache.org/jira/browse/DERBY-6967
             Project: Derby
          Issue Type: Bug
    Affects Versions: 10.13.1.1
            Reporter: Alex Vasiliev


h5. Problem:

It is appeared that query optimizer mises dramatically for a fairly simple query:
{code:sql}
SELECT * FROM Customer 
WHERE ID >= 0 
ORDER BY ID 
-- optionally: FETCH NEXT 2 ROWS ONLY;
{code}
The "ID >= 0" predicate is expected to return almost entire table (true), so optimizer decides to utilize Table Scan to avoid having to resolve index row references, but it should take into account that then it need to order the result set (sort), which is a way more expensive operation. Especially if it's followed by, let's say, "FETCH NEXT 2 ROWS ONLY" which would explicitly indicate that there is no need to iterate through the entire table.

h5. Steps to recreate:
{code:sql}
CREATE TABLE Customer (
  ID    BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
  Name  VARCHAR(1024) NOT NULL
);

INSERT INTO Customer (Name) VALUES ('Alex'), ('Peter'), ('Bob'), ('Fred');

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);

SELECT * FROM Customer WHERE ID >= 0 ORDER BY ID;

VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
{code}

h5. Result:
{code}
Statement Name: 
	null
Statement Text: 
	SELECT * FROM Customer WHERE ID >= 0 ORDER BY ID
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: 0
Begin Compilation Timestamp : null
End Compilation Timestamp : null
Begin Execution Timestamp : null
End Execution Timestamp : null
Statement Execution Plan Text: 
Sort ResultSet:
Number of opens = 1
Rows input = 4
Rows returned = 4
Eliminate duplicates = false
In sorted order = false
Sort information: 
	Number of rows input=4
	Number of rows output=4
	Sort type=internal
	constructor time (milliseconds) = 0
	open time (milliseconds) = 0
	next time (milliseconds) = 0
	close time (milliseconds) = 0
	optimizer estimated row count: 2.97
	optimizer estimated cost: 39.32
Source result set:
	Table Scan ResultSet for CUSTOMER at read committed isolation level using instantaneous share row locking chosen by the optimizer
	Number of opens = 1
	Rows seen = 4
	Rows filtered = 0
	Fetch Size = 16
		constructor time (milliseconds) = 0
		open time (milliseconds) = 0
		next time (milliseconds) = 0
		close time (milliseconds) = 0
		next time in milliseconds/row = 0

	scan information:
		Bit set of columns fetched=All
		Number of columns fetched=2
		Number of pages visited=1
		Number of rows qualified=4
		Number of rows visited=4
		Scan type=heap
		start position:
			null
		stop position:
			null
		qualifiers:
			Column[0][0] Id: 0
			Operator: <
			Ordered nulls: false
			Unknown return value: true
			Negate comparison result: true
		optimizer estimated row count: 2.97
		optimizer estimated cost: 39.32
{code}

^ Note that Table Scan is picked and not an index over ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)