You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Harshvardhan Gupta (JIRA)" <ji...@apache.org> on 2017/06/28 19:40:00 UTC

[jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.

    [ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067113#comment-16067113 ] 

Harshvardhan Gupta commented on DERBY-6942:
-------------------------------------------

The estimateCost function in FromBaseTable.java is the guts of the cost and cardinality estimation logic for base table predicates.

The function is called for each of conglomerate (i.e each of the access path possible on the table such as indexes, full table scan). For predicates that can be utilized as start or stop predicates for the given conglomerate (refer http://db.apache.org/derby/docs/10.13/tuning/ctunoptimz24840.html), row estimates are obtained through the store by giving the actual constant keys specified in the query, the estimate by row is quite accurate as it finds out the fraction of values between the start and stop keys.

For all other remaining predicates one of the two things happen:

1) If an equality op is specified and statistics exist for the conglomerate (and it is not a valid start/stop predicate), we query statistics for selectivity.(it takes into account number of unique columns)

2) Hard wired selectivities are used for all other cases.


As part of DERBY-6940, we started collecting extra statistics such as null count, min and max value for each of the columns of the index. We would like to utilise them to eliminate usage of hard wired selectivities whenever possible.

For example - 

Let us say we have a table and an associated index created using the following statement - 

create table abcd(id integer);
create index idx on abcd(id);

Now, suppose a query is issued - 

select * from abcd where id is NULL;

There are two possible access paths for the query and both are considered by the query optimizer - 

1) Index Scan - The store is able to give away fairly accurate row estimates as 'NULL' is a valid start as well as stop key.

2) Table scan - Here, hard wired selectivity estimates are used for row estimates as discussed above.

> Utilise additional statistics for selectivity estimates.
> --------------------------------------------------------
>
>                 Key: DERBY-6942
>                 URL: https://issues.apache.org/jira/browse/DERBY-6942
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)