You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Márcio Furlani Carmona (JIRA)" <ji...@apache.org> on 2018/02/01 22:22:00 UTC

[jira] [Created] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

Márcio Furlani Carmona created SPARK-23308:
----------------------------------------------

             Summary: ignoreCorruptFiles should not ignore retryable IOException
                 Key: SPARK-23308
                 URL: https://issues.apache.org/jira/browse/SPARK-23308
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.1, 2.3.1
            Reporter: Márcio Furlani Carmona


When `spark.sql.files.ignoreCorruptFiles` is set it totally ignores any kind of RuntimeException or IOException, but some possible IOExceptions may happen even if the file is not corrupted.

One example is the SocketTimeoutException which can be retried to possibly fetch the data without meaning the data is corrupted.

 

See: 

https://github.com/apache/spark/blob/e30e2698a2193f0bbdcd4edb884710819ab6397c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L163



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org