You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/07/19 08:41:00 UTC

[jira] [Resolved] (SPARK-36187) Commit collision avoidance in dynamicPartitionOverwrite for non-Parquet formats

     [ https://issues.apache.org/jira/browse/SPARK-36187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-36187.
----------------------------------
    Resolution: Incomplete

> Commit collision avoidance in dynamicPartitionOverwrite for non-Parquet formats
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-36187
>                 URL: https://issues.apache.org/jira/browse/SPARK-36187
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 3.1.2
>            Reporter: Tony Zhang
>            Priority: Minor
>
> Hi, my question here is specifically about [PR #29000|https://github.com/apache/spark/pull/29000/files#r649580767] for SPARK-29302.
> To my understanding, the PR is to introduce a different staging directory at job commit to avoid commit collision. In SQLHadoopMapReduceCommitProtocol, the new staging directory is only set when SQLConf.OUTPUT_COMMITTER_CLASS is not null: [code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala#L58], and in current Spark repo, OUTPUT_COMMITTER_CLASS seems set only for parquet formats: [code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L96].
> However I didn't find similar behavior in Orc related code to set that config. If I understand it correctly, without setting SQLConf.OUTPUT_COMMITTER_CLASS properly (like for Orc format), SQLHadoopMapReduceCommitProtocol will still use the original staging directory, which may void the fix by the PR, in which case the commit collision may still happen, thus the fix is now only effective for Parquet, but not for non-Parquet files.
> Could someone confirm if it is a potential problem, or not? Thanks!
> [~duripeng] [~dagrawal3409]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org