You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/11/24 17:56:00 UTC

[jira] [Assigned] (SPARK-26161) Ignore empty files in load

     [ https://issues.apache.org/jira/browse/SPARK-26161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-26161:
------------------------------------

    Assignee: Apache Spark

> Ignore empty files in load
> --------------------------
>
>                 Key: SPARK-26161
>                 URL: https://issues.apache.org/jira/browse/SPARK-26161
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Assignee: Apache Spark
>            Priority: Minor
>
> Currently, empty files are opened in load, and Spark tries to read data from them. In some cases, empty partitions are produced from such empty files. For example, in the case of *wholetext* in Text datasource and *multiLine* modes in CSV/JSON datasource. The behaviour is unnecessary, and empty files can be skipped in read. It can reduce number of tasks submitted for loading empty files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org