You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2014/10/07 00:43:33 UTC

[jira] [Resolved] (SPARK-2530) Relax incorrect assumption of one ExternalAppendOnlyMap per thread

     [ https://issues.apache.org/jira/browse/SPARK-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia resolved SPARK-2530.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.0

This was fixed by SPARK-2711.

> Relax incorrect assumption of one ExternalAppendOnlyMap per thread
> ------------------------------------------------------------------
>
>                 Key: SPARK-2530
>                 URL: https://issues.apache.org/jira/browse/SPARK-2530
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.1
>            Reporter: Andrew Or
>             Fix For: 1.1.0
>
>
> Originally reported by Matei.
> Our current implementation of EAOM assumes only one map is created per task. This is not true in the following case, however:
> {code}
> rdd1.join(rdd2).reduceByKey(...)
> {code}
> This is because reduce by key does a map side combine, which creates an EAOM that streams from an EAOM previously created by the same thread to aggregate values from the join.
> The more concerning thing is the following: we currently maintain a global shuffle memory map (thread ID -> memory used by that thread to shuffle). If we create two EAOMs in the same thread, the memory occupied by the first map may be clobbered by that occupied by the second. This has very adverse consequences if the first map is huge but the second is just starting out, in which case we end up believing that we use much less memory than we actually do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org