You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jinhai (Jira)" <ji...@apache.org> on 2021/10/22 11:02:00 UTC

[jira] [Comment Edited] (SPARK-37006) MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading

    [ https://issues.apache.org/jira/browse/SPARK-37006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429079#comment-17429079 ] 

jinhai edited comment on SPARK-37006 at 10/22/21, 11:01 AM:
------------------------------------------------------------

Or whether we can generate localDirs based on appId and execId, just like DiskBlockManager.getFile, so that we don't need to save localDirs in MapStatus, just add appId to MapStatus


was (Author: csbliss):
Or whether we can generate localDirs based on appId and execId, just like DiskBlockManager.getFile, so that we don't need to save localDirs in MapStatus, just add appId.

> MapStatus adds localDirs to avoid the rpc request by method getHostLocalDirs when shuffle reading
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37006
>                 URL: https://issues.apache.org/jira/browse/SPARK-37006
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 3.1.2
>            Reporter: jinhai
>            Priority: Major
>
> When executing the ShuffleBlockFetcherIterator.fetchHostLocalBlocks method, in order to obtain the hostLocalDirs value, we need to send an RPC request through ExternalBlockStoreClient or NettyBlockTransferService. Then get shuffle data according to blockId and localDirs.
> We can add localDir to the BlockManagerId class of MapStatus, so that we can get localDir directly when fetch host-local blocks without sending RPC requests.
> The benefits are:
> 1. No need to send RPC request localDirs value when fetchHostLocalBlocks;
> 2. When the external shuffle service is enabled, there is no need to register ExecutorShuffleInfo in the ExternalShuffleBlockResolver class, nor to save the ExecutorShuffleInfo data in the ExternalShuffleBlockResolver class through leveldb.
> 3. Also, there is no need to cache host-local dirs in the HostLocalDirManager class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org