You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jens Rabe (JIRA)" <ji...@apache.org> on 2018/02/12 12:50:00 UTC

[jira] [Updated] (SPARK-23395) Add an option to return an empty DataFrame from an RDD generated by a Hadoop file when there are no usable paths

     [ https://issues.apache.org/jira/browse/SPARK-23395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jens Rabe updated SPARK-23395:
------------------------------
    Target Version/s: 2.2.1, 2.2.0  (was: 2.2.0, 2.2.1)
             Summary: Add an option to return an empty DataFrame from an RDD generated by a Hadoop file when there are no usable paths  (was: Add an option to return an empty DataFrame from an RDD generated by a Hadoop file)

> Add an option to return an empty DataFrame from an RDD generated by a Hadoop file when there are no usable paths
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23395
>                 URL: https://issues.apache.org/jira/browse/SPARK-23395
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 2.2.0, 2.2.1
>            Reporter: Jens Rabe
>            Priority: Minor
>              Labels: DataFrame, HadoopInputFormat, RDD
>
> When using file-based data from custom formats, Spark's ability to use Hadoop's FileInputFormats is very handy. However, when the path they are pointed at contains no usable data, they throw an IOException saying "No input paths specified in job".
> It would be a nice feature if the DataFrame API somehow could capture this and return an empty DataFrame instead of failing the job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org