You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "tonydoen (Jira)" <ji...@apache.org> on 2022/03/23 19:29:00 UTC

[jira] [Created] (SPARK-38639) Support ignoreCorruptRecord flag parallel to ignoreCorruptFiles

tonydoen created SPARK-38639:
--------------------------------

             Summary: Support ignoreCorruptRecord flag parallel to ignoreCorruptFiles
                 Key: SPARK-38639
                 URL: https://issues.apache.org/jira/browse/SPARK-38639
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.1, 3.1.2
            Reporter: tonydoen
             Fix For: 3.2.1


There's an existing flag "spark.sql.files.ignoreCorruptFiles" and "spark.sql.files.ignoreMissingFiles" that will quietly ignore attempted reads from files that have been corrupted, but it still allows the query to fail on sequence files.

 

Being able to ignore corrupt record is useful in the scenarios that users want to query successfully in dirty data(mixed schema in one table).

 

We would like to add a "spark.sql.hive.ignoreCorruptRecord"  to fill out the functionality.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org