You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2018/06/19 01:32:00 UTC

[jira] [Created] (SPARK-24589) OutputCommitCoordinator may allow duplicate commits

Marcelo Vanzin created SPARK-24589:
--------------------------------------

             Summary: OutputCommitCoordinator may allow duplicate commits
                 Key: SPARK-24589
                 URL: https://issues.apache.org/jira/browse/SPARK-24589
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.1, 2.2.1
            Reporter: Marcelo Vanzin


This is a sibling bug to SPARK-24552. While investigating the source of that bug, it was found that currently the output committer allows duplicate commits when there are stage retries, and the task with the task attempt number (one in each stage that currently has running tasks) try to commit their output.

This can lead to duplicate data in the output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org