You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/10/11 05:04:20 UTC

[jira] [Commented] (SPARK-17858) Provide option for Spark SQL to skip corrupt files

    [ https://issues.apache.org/jira/browse/SPARK-17858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564479#comment-15564479 ] 

Sean Owen commented on SPARK-17858:
-----------------------------------

Yeah, the related JIRA gives an argument that we shouldn't do this. You end up more easily silently ignoring data if it doesn't fail the query. I'm not that sure this is a good idea.

> Provide option for Spark SQL to skip corrupt files
> --------------------------------------------------
>
>                 Key: SPARK-17858
>                 URL: https://issues.apache.org/jira/browse/SPARK-17858
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Shixiong Zhu
>
> In Spark 2.0, corrupt files will fail a SQL query. However, the user may just want to skip corrupt files and still run the query.
> Another painful thing is the current exception doesn't contain the paths of corrupt files, makes the user hard to fix their files.
> Note: In Spark 1.6, Spark SQL always skip corrupt files because of SPARK-17850.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org