You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Abhishek Madav (Jira)" <ji...@apache.org> on 2020/02/27 22:50:00 UTC

[jira] [Commented] (SPARK-30442) Write mode ignored when using CodecStreams

    [ https://issues.apache.org/jira/browse/SPARK-30442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047024#comment-17047024 ] 

Abhishek Madav commented on SPARK-30442:
----------------------------------------

In case of task failures, say the task fails to write to local-disk or is interrupted, the file is empty but materialized on the file-system. The next task which retries to write to this location would see the file and return a FileAlreadyExistException. Thus making it not resilient to task-failures.

> Write mode ignored when using CodecStreams
> ------------------------------------------
>
>                 Key: SPARK-30442
>                 URL: https://issues.apache.org/jira/browse/SPARK-30442
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.4.4
>            Reporter: Jesse Collins
>            Priority: Major
>
> Overwrite is hardcoded to false in the codec stream. This can cause issues, particularly with aws tools, that make it impossible to retry.
> Ideally, this should be read from the write mode set for the DataWriter that is writing through this codec class.
> [https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala#L81]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org