You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Al M (JIRA)" <ji...@apache.org> on 2018/06/06 12:29:00 UTC
[jira] [Created] (SPARK-24474) Cores are left idle when there are a
lot of stages to run
Al M created SPARK-24474:
----------------------------
Summary: Cores are left idle when there are a lot of stages to run
Key: SPARK-24474
URL: https://issues.apache.org/jira/browse/SPARK-24474
Project: Spark
Issue Type: Bug
Components: Scheduler
Affects Versions: 2.2.0
Reporter: Al M
I've observed an issue happening consistently when:
* A job contains a join of two datasets
* One dataset is much larger than the other
* Both datasets require some processing before they are joined
What I have observed is:
* 2 stages are initially active to run processing on the two datasets
** These stages are run in parallel
** One stage has significantly more tasks than the other (e.g. one has 30k tasks and the other has 2k tasks)
** Spark allocates a similar (though not exactly equal) number of cores to each stage
* First stage completes (for the smaller dataset)
** Now there is only one stage running
** It still has many tasks left (usually > 20k tasks)
** Around half the cores are idle (e.g. Total Cores = 200, active tasks = 103)
** This continues until the second stage completes
* Second stage completes, and third begins (the stage that actually joins the data)
** This stage works fine, no cores are idle (e.g. Total Cores = 200, active tasks = 200)
Other interesting things about this:
* It seems that when we have multiple stages active, and one of them finishes, it does not actually release any cores to other stages
* I can't reproduce this locally on my machine, only on a cluster with YARN enabled
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org