You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Wang, Gang (JIRA)" <ji...@apache.org> on 2018/12/19 09:33:00 UTC

[jira] [Commented] (SPARK-26375) Rule PruneFileSourcePartitions should be fired before any other rules based on table statistics

    [ https://issues.apache.org/jira/browse/SPARK-26375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724827#comment-16724827 ] 

Wang, Gang commented on SPARK-26375:
------------------------------------

Should be okay, filter on partition columns is also regarded as a normal filter, and the output stats is measured in class FilterEstimation.

 

> Rule PruneFileSourcePartitions should be fired before any other rules based on table statistics
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26375
>                 URL: https://issues.apache.org/jira/browse/SPARK-26375
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Wang, Gang
>            Priority: Major
>
> In catalyst, some optimize rules are base on table statistics, like rule ReorderJoin, in which star schema is detected, and CostBasedJoinReorder. In these rules, statistics accuracy are crucial. While, currently all these rules are fired before partition pruning, which may result in inaccurate statistics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org