You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/08 08:27:58 UTC

[GitHub] [spark] attilapiros opened a new pull request #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host

attilapiros opened a new pull request #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host
URL: https://github.com/apache/spark/pull/24554
 
 
   ## What changes were proposed in this pull request?
   
   Before this PR during fetching a disk persisted RDD block the network was always used to get the requested block content even when both the source and fetcher executor was running on the same host.
   
   The idea to access another executor local disk files by directly reading the disk comes from the external shuffle service where the local dirs are stored for each executor (block manager).
   
   To make this possible the following changes are done:
   - `RegisterBlockManager` message is extended with the `localDirs` which is stored by the block manager master for each block manager as a new property of the `BlockManagerInfo`
   - `GetLocationsAndStatus` is extended with the requester host
   - `BlockLocationsAndStatus` (the reply for `GetLocationsAndStatus` message) is extended with the an option of local directories, which is filled with a local directories of a same host executor (if there is any, otherwise None is used). This is where the block content can be read from.
   
   Shuffle blocks are out of scope of this PR: there will be a separate PR opened for that (for another Jira issue). 
   
   ## How was this patch tested?
   
   With a new unit test in `BlockManagerSuite`. See the the test prefixed by "SPARK-27622: avoid the network when block requested from same host".

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org