You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/05/26 23:50:22 UTC

[jira] [Assigned] (SPARK-7829) SortShuffleWriter writes inconsistent data & index files on stage retry

     [ https://issues.apache.org/jira/browse/SPARK-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-7829:
-----------------------------------

    Assignee: Apache Spark  (was: Imran Rashid)

> SortShuffleWriter writes inconsistent data & index files on stage retry
> -----------------------------------------------------------------------
>
>                 Key: SPARK-7829
>                 URL: https://issues.apache.org/jira/browse/SPARK-7829
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.3.1
>            Reporter: Imran Rashid
>            Assignee: Apache Spark
>
> When a stage is retried, even if a shuffle map task was successful, it may get retried in any case.  If it happens to get scheduled on the same executor, the old data file is *appended*, while the index file still assumes the data starts in position 0.  This leads to an apparently corrupt shuffle map output, since when the data file is read, the index file points to the wrong location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org