You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 05:35:29 UTC
[jira] [Updated] (SPARK-636) Add mechanism to run system
management/configuration tasks on all workers
[ https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-636:
-------------------------------
Labels: bulk-closed (was: )
> Add mechanism to run system management/configuration tasks on all workers
> -------------------------------------------------------------------------
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Josh Rosen
> Priority: Major
> Labels: bulk-closed
>
> It would be useful to have a mechanism to run a task on all workers in order to perform system management tasks, such as purging caches or changing system properties. This is useful for automated experiments and benchmarking; I don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { }
> {code}
> but this does not guarantee that every worker runs a task and requires my user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests. For example, I might want to drop cached RDDs, purge shuffle data, and call {{System.gc()}} between test runs. It makes sense to incorporate some of this functionality, such as dropping cached RDDs, into Spark itself, but it might be helpful to have a general mechanism for running ad-hoc tasks like {{System.gc()}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org