You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by praveen seluka <pr...@gmail.com> on 2014/09/12 08:44:33 UTC

Yarn Over-allocating Containers

Hi all

Am seeing a strange issue in Spark on Yarn(Stable). Let me know if known,
or am missing something as it looks very fundamental.

Launch a Spark job with 2 Containers. addContainerRequest called twice and
then calls allocate to AMRMClient. This will get 2 Containers allocated.
Fine as of now.

Reporter thread starts. Now, if 1 of the container dies - this is what
happens. Reporter thread adds another addContainerRequest and the next
allocate is *actually* getting back 3 containers (total no of container
requests from beginning). Reporter thread has a check to discard (release)
excess container and ends-up releasing 2.

In summary, job starts with 2 containers, 1 dies(lets say), reporter thread
adds 1 more container request, subsequently gets back 3 allocated
containers(from yarn) and discards 2 as it needed just 1.

Thanks
Praveen

Re: Yarn Over-allocating Containers

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Praveen,

I believe you are correct.  I noticed this a little while ago and had a fix
for it as part of SPARK-1714, but that's been delayed.  I'll look into this
a little deeper and file a JIRA.

-Sandy

On Thu, Sep 11, 2014 at 11:44 PM, praveen seluka <pr...@gmail.com>
wrote:

> Hi all
>
> Am seeing a strange issue in Spark on Yarn(Stable). Let me know if known,
> or am missing something as it looks very fundamental.
>
> Launch a Spark job with 2 Containers. addContainerRequest called twice and
> then calls allocate to AMRMClient. This will get 2 Containers allocated.
> Fine as of now.
>
> Reporter thread starts. Now, if 1 of the container dies - this is what
> happens. Reporter thread adds another addContainerRequest and the next
> allocate is *actually* getting back 3 containers (total no of container
> requests from beginning). Reporter thread has a check to discard (release)
> excess container and ends-up releasing 2.
>
> In summary, job starts with 2 containers, 1 dies(lets say), reporter
> thread adds 1 more container request, subsequently gets back 3 allocated
> containers(from yarn) and discards 2 as it needed just 1.
>
> Thanks
> Praveen
>