You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Gautam Parai (JIRA)" <ji...@apache.org> on 2019/05/01 01:33:00 UTC

[jira] [Updated] (DRILL-7231) TPCDS-21 regresses after fix for DRILL-7148

     [ https://issues.apache.org/jira/browse/DRILL-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gautam Parai updated DRILL-7231:
--------------------------------
    Description: 
The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries.

One of theĀ  fixes for DRILL-7148, introduced a change in DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when not all columns in the group-by key have NDV statistics. However, the fix was incorrect and instead caused it to use the guess-timate NDV even when statistics were present.

Since the NDV was estimated as 0.1 * input_count_count because of the regression, the join cardinality was severely underestimated for TPCDS-21 = 400M * 15 / Max(400K, 15) = 150.

  was:The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries.


> TPCDS-21 regresses after fix for DRILL-7148
> -------------------------------------------
>
>                 Key: DRILL-7231
>                 URL: https://issues.apache.org/jira/browse/DRILL-7231
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Gautam Parai
>            Assignee: Gautam Parai
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries.
> One of theĀ  fixes for DRILL-7148, introduced a change in DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when not all columns in the group-by key have NDV statistics. However, the fix was incorrect and instead caused it to use the guess-timate NDV even when statistics were present.
> Since the NDV was estimated as 0.1 * input_count_count because of the regression, the join cardinality was severely underestimated for TPCDS-21 = 400M * 15 / Max(400K, 15) = 150.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)