You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:33:57 UTC
[jira] [Resolved] (SPARK-12147) Off heap storage and
dynamicAllocation operation
[ https://issues.apache.org/jira/browse/SPARK-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-12147.
----------------------------------
Resolution: Incomplete
> Off heap storage and dynamicAllocation operation
> ------------------------------------------------
>
> Key: SPARK-12147
> URL: https://issues.apache.org/jira/browse/SPARK-12147
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.5.2
> Environment: Cloudera Hadoop 2.6.0-cdh5.4.8
> Tachyon 0.7.1
> Yarn
> Reporter: Rares Mirica
> Priority: Minor
> Labels: bulk-closed
> Attachments: spark-defaults.conf
>
>
> For the purpose of increasing computation density and efficiency I set up to test off-heap storage (using Tachyon) with dynamicAllocation enabled.
> Following the available documentation (programming-guide for Spark 1.5.2) I was expecting data to be cached in Tachyon for the lifetime of the application (driver instance) or until unpersist() is called. This belief was supported by the doc: "Cached data is not lost if individual executors crash." where with crash I also assimilate Graceful Decommission. Furthermore, in the GD description documented in the job-scheduling document cached data preservation through off-heap storage is also hinted at.
> Seeing how Tachyon is now in a state where these promises of a better future are well within reach, I consider it a bug that upon graceful decommission of an executor the off-heap data is deleted (presumably as part of the cleanup phase).
> Needless to say, enabling the preservation of the off-heap persisted data after graceful decommission for dynamic allocation would yield significant improvements in resource allocation, especially over yarn where executors use up compute "slots" even if idle. After a long, expensive, computation where we take advantage of the dynamically scaled executors, the rest of the spark jobs can use the cached data while releasing the compute resources for other cluster tasks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org