You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/16 22:54:06 UTC

[jira] [Updated] (SPARK-30427) Add config item for limiting partition number when calculating statistics through File System

     [ https://issues.apache.org/jira/browse/SPARK-30427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-30427:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> Add config item for limiting partition number when calculating statistics through File System
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30427
>                 URL: https://issues.apache.org/jira/browse/SPARK-30427
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Hu Fuwang
>            Priority: Major
>
> Currently, when spark need to calculate the statistics (eg. sizeInBytes) of table partition through file system (eg. HDFS), it does not consider the number of partitions. Then if the the number of partitions is huge, it will cost much time to calculate the statistics which may be not be that useful.
> It should be reasonable to add a config item to control the limit of partition number allowable to calculate statistics through file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org