You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "prakhar jauhari (JIRA)" <ji...@apache.org> on 2015/07/28 06:11:04 UTC

[jira] [Commented] (SPARK-9396) Spark yarn allocator does not call "removeContainerRequest" for allocated Container requests, resulting in bloated ask[] toYarn RM.

    [ https://issues.apache.org/jira/browse/SPARK-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643822#comment-14643822 ] 

prakhar jauhari commented on SPARK-9396:
----------------------------------------

This is because Yarn's AM client does not remove old container request from its MAP until the application's AM calls removeConatinerRequest for fulfilled container requests.

Spark-1.2 : Spark's AM does not call removeConatinerRequest for fulfilled container request.

Spark-1.3 : calls removeConatinerRequest for the container requests it can map to be fulfilled. Tried the same test case of killing one executor with spark-1.3 and the ask[] in this case was for 1 container.

As long as the cluster size is large enough to allocate the bloated container requests, containers are sent to spark yarn allocator in allocate response, spark yarn allocator uses missing number of container to launch new executors and release the extra allocated containers. 

The problem increase in case of a long running job with large executor memory requirements. In this case when ever a executor gets killed, the next ask to yarn Resource manager (RM) is of n+1 containers, which might be served by the RM if it still has enough resources, else RM starts reserving cluster resources for a containers which are not even required by spark in the first place. This causes inefficient resource utilization of cluster resources. 

I have added changes for removing fulfilled conatainer requests in spark 1.2.1 code.    


> Spark yarn allocator does not call "removeContainerRequest" for allocated Container requests, resulting in bloated ask[] toYarn RM.
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-9396
>                 URL: https://issues.apache.org/jira/browse/SPARK-9396
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.2.1
>         Environment: Spark-1.2.1 on hadoop-yarn-2.4.0 cluster. All servers in cluster running Linux version 2.6.32.
>            Reporter: prakhar jauhari
>
> Note : Attached logs contain logs that i added (spark yarn allocator side and Yarn client side) for debugging purpose.
> !!!!! My spark job is configured for 2 executors, on killing 1 executor the ask is of 3 !!!!!!!
> On killing a executor  - resource request logs :
> *************Killed container: ask for 3 containers, instead for 1***********
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Will allocate 1 executor containers, each with 2432 MB memory including 384 MB overhead
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: numExecutors: 1
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: host preferences is empty
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Container request (host: Any, priority: 1, capability: <memory:2432, vCores:4>
> 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: this.ask = [{Priority: 1, Capability: <memory:2432, vCores:4>, # Containers: 3, Location: *, Relax Locality: true}]
> 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: allocateRequest = ask { priority{ priority: 1 } resource_name: "*" capability { memory: 2432 virtual_cores: 4 } num_containers: 3 relax_locality: true } blacklist_request { } response_id: 354 progress: 0.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org