You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jinhai (Jira)" <ji...@apache.org> on 2021/10/22 11:02:00 UTC

[jira] [Issue Comment Deleted] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading

     [ https://issues.apache.org/jira/browse/SPARK-37006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jinhai updated SPARK-37006:
---------------------------
    Comment: was deleted

(was: hi [~Ngone51], can you review this issue for me?)

> MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37006
>                 URL: https://issues.apache.org/jira/browse/SPARK-37006
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 3.1.2
>            Reporter: jinhai
>            Priority: Major
>
> When executing the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, in order to obtain the hostLocalDirs value, we need to send an RPC request through ExternalBlockStoreClient or NettyBlockTransferService. Then get shuffle data according to blockId and localDirs.
> We can add localDir to the BlockManagerId class of MapStatus, so that we can get localDir directly when fetch host-local blocks without sending RPC requests.
> The benefits are:
> 1. No need to send RPC request localDirs value when fetchHostLocalBlocks;
> 2. When the external shuffle service is enabled, there is no need to register ExecutorShuffleInfo in the ExternalShuffleBlockResolver class, nor to save the ExecutorShuffleInfo data in the ExternalShuffleBlockResolver class through leveldb.
> 3. Also, there is no need to cache host-local dirs in the HostLocalDirManager class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org