You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2018/12/02 02:30:00 UTC

[jira] [Assigned] (SPARK-26161) Ignore empty files in load

     [ https://issues.apache.org/jira/browse/SPARK-26161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan reassigned SPARK-26161:
-----------------------------------

    Assignee: Maxim Gekk

> Ignore empty files in load
> --------------------------
>
>                 Key: SPARK-26161
>                 URL: https://issues.apache.org/jira/browse/SPARK-26161
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Minor
>
> Currently, empty files are opened in load, and Spark tries to read data from them. In some cases, empty partitions are produced from such empty files. For example, in the case of *wholetext* in Text datasource and *multiLine* modes in CSV/JSON datasource. The behaviour is unnecessary, and empty files can be skipped in read. It can reduce number of tasks submitted for loading empty files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org