You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matt Cheah (Jira)" <ji...@apache.org> on 2020/07/31 00:10:00 UTC
[jira] [Created] (SPARK-32504) Shuffle Storage API: Dynamic updates
of shuffle metadata
Matt Cheah created SPARK-32504:
----------------------------------
Summary: Shuffle Storage API: Dynamic updates of shuffle metadata
Key: SPARK-32504
URL: https://issues.apache.org/jira/browse/SPARK-32504
Project: Spark
Issue Type: Sub-task
Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah
When using external storage for shuffles as part of the shuffle storage API mechanism, it is often desirable to update the metadata associated with shuffles that we have enabled plugin systems to implement via https://issues.apache.org/jira/browse/SPARK-31801. For example:
# If data is stored in some replicated manner, and the number of replicas is updated - then we want the metadata stored on the driver to reflect the new number of replicas and where they are located.
# If data is stored on the mapper's local disk, but is asynchronously backed up to some external storage medium, then we want to know when a backup is available externally.
To achieve this, we would need to pass a hook to updating the shuffle metadata to the shuffle executor components at the root of the plugin tree on the executor side. The executor would establish an RPC connection with the driver and send messages to update shuffle metadata accordingly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org