You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Abhishek Madav (Jira)" <ji...@apache.org> on 2020/02/27 22:50:00 UTC
[jira] [Commented] (SPARK-30442) Write mode ignored when using
CodecStreams
[ https://issues.apache.org/jira/browse/SPARK-30442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047024#comment-17047024 ]
Abhishek Madav commented on SPARK-30442:
----------------------------------------
In case of task failures, say the task fails to write to local-disk or is interrupted, the file is empty but materialized on the file-system. The next task which retries to write to this location would see the file and return a FileAlreadyExistException. Thus making it not resilient to task-failures.
> Write mode ignored when using CodecStreams
> ------------------------------------------
>
> Key: SPARK-30442
> URL: https://issues.apache.org/jira/browse/SPARK-30442
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 2.4.4
> Reporter: Jesse Collins
> Priority: Major
>
> Overwrite is hardcoded to false in the codec stream. This can cause issues, particularly with aws tools, that make it impossible to retry.
> Ideally, this should be read from the write mode set for the DataWriter that is writing through this codec class.
> [https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala#L81]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org