You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Gautam Parai (JIRA)" <ji...@apache.org> on 2019/05/01 01:33:00 UTC
[jira] [Updated] (DRILL-7231) TPCDS-21 regresses after fix for
DRILL-7148
[ https://issues.apache.org/jira/browse/DRILL-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gautam Parai updated DRILL-7231:
--------------------------------
Description:
The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries.
One of theĀ fixes for DRILL-7148, introduced a change in DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when not all columns in the group-by key have NDV statistics. However, the fix was incorrect and instead caused it to use the guess-timate NDV even when statistics were present.
Since the NDV was estimated as 0.1 * input_count_count because of the regression, the join cardinality was severely underestimated for TPCDS-21 = 400M * 15 / Max(400K, 15) = 150.
was:The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries.
> TPCDS-21 regresses after fix for DRILL-7148
> -------------------------------------------
>
> Key: DRILL-7231
> URL: https://issues.apache.org/jira/browse/DRILL-7231
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Gautam Parai
> Assignee: Gautam Parai
> Priority: Major
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> The join rowcount regresses a lot after changes made for DRILL-7148. This affects several TPC-DS queries.
> One of theĀ fixes for DRILL-7148, introduced a change in DrillRelMDDistinctRowcount to only use the guess of 0.1*input_row_count when not all columns in the group-by key have NDV statistics. However, the fix was incorrect and instead caused it to use the guess-timate NDV even when statistics were present.
> Since the NDV was estimated as 0.1 * input_count_count because of the regression, the join cardinality was severely underestimated for TPCDS-21 = 400M * 15 / Max(400K, 15) = 150.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)