You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "koert kuipers (Jira)" <ji...@apache.org> on 2019/11/13 21:29:00 UTC

[jira] [Commented] (SPARK-28945) Allow concurrent writes to different partitions with dynamic partition overwrite

    [ https://issues.apache.org/jira/browse/SPARK-28945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973715#comment-16973715 ] 

koert kuipers commented on SPARK-28945:
---------------------------------------

i understand there is a great deal of complexity in the committer and this might require more work to get it right

but its still unclear to me if the committer is doing anything at all in case of dynamic partition overwrite.
what do i lose by disabling all committer activity (committer.setupJob, committer.commitJob, etc.) when dynamicPartitionOverwrite is true? and if i lose nothing, is that a good thing, or does that mean i should be worried about the current state?

> Allow concurrent writes to different partitions with dynamic partition overwrite
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-28945
>                 URL: https://issues.apache.org/jira/browse/SPARK-28945
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: koert kuipers
>            Priority: Minor
>
> It is desirable to run concurrent jobs that write to different partitions within same baseDir using partitionBy and dynamic partitionOverwriteMode.
> See for example here:
> https://stackoverflow.com/questions/38964736/multiple-spark-jobs-appending-parquet-data-to-same-base-path-with-partitioning
> Or the discussion here:
> https://github.com/delta-io/delta/issues/9
> This doesnt seem that difficult. I suspect only changes needed are in org.apache.spark.internal.io.HadoopMapReduceCommitProtocol, which already has a flag for dynamicPartitionOverwrite. I got a quick test to work by disabling all committer activity (committer.setupJob, committer.commitJob, etc.) when dynamicPartitionOverwrite is true. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org