You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/31 06:13:13 UTC

[GitHub] [spark] attilapiros commented on a change in pull request #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host

attilapiros commented on a change in pull request #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host
URL: https://github.com/apache/spark/pull/24554#discussion_r289267957
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##########
 @@ -827,10 +832,57 @@ private[spark] class BlockManager(
    */
   private[spark] def getRemoteValues[T: ClassTag](blockId: BlockId): Option[BlockResult] = {
     val ct = implicitly[ClassTag[T]]
-    getRemoteManagedBuffer(blockId).map { data =>
+    getRemoteBlock(blockId, (data: ManagedBuffer) => {
       val values =
         serializerManager.dataDeserializeStream(blockId, data.createInputStream())(ct)
       new BlockResult(values, DataReadMethod.Network, data.size)
+    })
+  }
+
+  /**
+   * Get the remote block and transform it to the provided data type.
+   *
+   * If the block is persisted to the disk and stored at an executor running on the same host then
+   * first it is tried to be accessed using the local directories of the other executor directly.
+   * If the file is successfully identified then tried to be transformed by the provided
+   * transformation function which expected to open the file. If there is any exception during this
+   * transformation then block access falls back to fetching it from the remote executor via the
+   * network.
+   *
+   * @param blockId identifies the block to get
+   * @param bufferTransformer this transformer expected to open the file if the block is backed by a
+   *                          file by this it is guaranteed the whole content can be loaded
+   * @tparam T result type
+   * @return
+   */
+   private[spark] def getRemoteBlock[T](
+       blockId: BlockId,
+       bufferTransformer: ManagedBuffer => T): Option[T] = {
 
 Review comment:
   Of course I know the multiple parameter lists. I just followed here this rule from the style guide: [Do NOT use multi-parameter lists.](https://github.com/databricks/scala-style-guide#java-multi-param-list) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org