You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2020/01/07 18:36:00 UTC

[jira] [Commented] (SPARK-30442) Write mode ignored when using CodecStreams

    [ https://issues.apache.org/jira/browse/SPARK-30442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009963#comment-17009963 ] 

Maxim Gekk commented on SPARK-30442:
------------------------------------

> This can cause issues, particularly with aws tools, that make it impossible to retry.

Could you clarify how it makes retry impossible. When the mode is set to overwrite, Spark deletes entire folder and writes new files - should be no clashes. In the append mode, new files are added - Spark does not append to existing files. What's the situation when files should be overwritten? 

> Write mode ignored when using CodecStreams
> ------------------------------------------
>
>                 Key: SPARK-30442
>                 URL: https://issues.apache.org/jira/browse/SPARK-30442
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.4.4
>            Reporter: Jesse Collins
>            Priority: Major
>
> Overwrite is hardcoded to false in the codec stream. This can cause issues, particularly with aws tools, that make it impossible to retry.
> Ideally, this should be read from the write mode set for the DataWriter that is writing through this codec class.
> [https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala#L81]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org