You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (JIRA)" <ji...@apache.org> on 2018/11/24 17:45:00 UTC

[jira] [Created] (SPARK-26161) Ignore empty files in load

Maxim Gekk created SPARK-26161:
----------------------------------

             Summary: Ignore empty files in load
                 Key: SPARK-26161
                 URL: https://issues.apache.org/jira/browse/SPARK-26161
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Maxim Gekk


Currently, empty files are opened in load, and Spark tries to read data from them. In some cases, empty partitions are produced from such empty files. For example, in the case of *wholetext* in Text datasource and *multiLine* modes in CSV/JSON datasource. The behaviour is unnecessary, and empty files can be skipped in read. It can reduce number of tasks submitted for loading empty files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org