You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Adam Kennedy (JIRA)" <ji...@apache.org> on 2018/10/30 20:10:00 UTC
[jira] [Updated] (SPARK-25888) Service requests for persist() blocks via external service after dynamic deallocation

     [ https://issues.apache.org/jira/browse/SPARK-25888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kennedy updated SPARK-25888:
---------------------------------
    Summary: Service requests for persist() blocks via external service after dynamic deallocation  (was: Service requests for persist() block via external service after dynamic deallocation)

> Service requests for persist() blocks via external service after dynamic deallocation
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-25888
>                 URL: https://issues.apache.org/jira/browse/SPARK-25888
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager, Shuffle, YARN
>    Affects Versions: 2.3.2
>            Reporter: Adam Kennedy
>            Priority: Major
>
> Large and highly multi-tenant Spark on YARN clusters with large populations and diverse job execution often display terrible utilization rates (we have observed as low as 3-7% CPU at max container allocation) due in large part to difficulties with persist() blocks (DISK or DISK+MEMORY) preventing dynamic deallocation.
> In situations where an external shuffle service is present (which is typical on clusters of this type) we already solve this for the shuffle block case by offloading the IO handling of shuffle blocks to the external service, allowing dynamic deallocation to proceed.
> Allowing Executors to transfer persist() blocks to some external "shuffle" service in a similar manner would be an enormous win for Spark multi-tenancy as it would limit deallocation blocking scenarios to only MEMORY-only cache() scenarios.
> I'm not sure if I'm correct, but I seem to recall seeing in the original external shuffle service commits that may have been considered at the time but getting shuffle blocks moved to the external shuffle service was the first priority.
> With support for external persist() DISK blocks in place, we could also then handle deallocation of DISK+MEMORY, as the memory instance could first be dropped, changing the block to DISK only, and then further transferred to the shuffle service.
> We have tried to resolve the persist() issue via extensive user training, but that has typically only allowed us to improve utilization of the worst offenders (10% utilization) up to around 40-60% utilization, as the need for persist() is often legitimate and occurs during the middle stages of a job.
> In a healthy multi-tenant scenario, the job could spool up to say 10,000 cores, persist() data, release executors across a long tail (to say 100 cores) and then spool back up to 10,000 cores for the following stage without impact on the persist() data.
> In an ideal world, if an new executor started up on a node on which blocks had been transferred to the shuffle service, the new executor might even be able to "recapture" control of those blocks (if that would help with performance in some way).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org