You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hu Fuwang (Jira)" <ji...@apache.org> on 2020/01/09 09:21:00 UTC

[jira] [Updated] (SPARK-30469) Partition columns should not be involved when calculating sizeInBytes of Project logical plan

     [ https://issues.apache.org/jira/browse/SPARK-30469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hu Fuwang updated SPARK-30469:
------------------------------
    Summary: Partition columns should not be involved when calculating sizeInBytes of Project logical plan  (was: Hive Partition columns should not be involved when calculating sizeInBytes of Project logical plan)

> Partition columns should not be involved when calculating sizeInBytes of Project logical plan
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30469
>                 URL: https://issues.apache.org/jira/browse/SPARK-30469
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Hu Fuwang
>            Priority: Major
>
> When getting the statistics of a Project logical plan, if CBO not enabled, Spark will call SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode to calculate the size in bytes, which will compute the ratio of the row size of the project plan and its child plan.
> And the row size is computed based on the output attributes (columns). Currently, SizeInBytesOnlyStatsPlanVisitor.visitUnaryNode involve partition columns of hive table as well, which is not reasonable, because hive partition column actually does not account for sizeInBytes.
> This may make the sizeInBytes not accurate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org