You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/10/18 20:00:05 UTC

[jira] [Resolved] (SPARK-6593) Provide option for HadoopRDD to skip corrupted files

     [ https://issues.apache.org/jira/browse/SPARK-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-6593.
------------------------------
    Resolution: Won't Fix

> Provide option for HadoopRDD to skip corrupted files
> ----------------------------------------------------
>
>                 Key: SPARK-6593
>                 URL: https://issues.apache.org/jira/browse/SPARK-6593
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.3.0
>            Reporter: Dale Richardson
>            Priority: Minor
>
> When reading a large amount of gzip files from HDFS eg. with  sc.textFile("hdfs:///user/cloudera/logs*.gz"), If the hadoop input libraries report an exception then the entire job is canceled. As default behaviour this is probably for the best, but it would be nice in some circumstances where you know it will be ok to have the option to skip the corrupted file and continue the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org