You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2016/03/04 00:46:01 UTC

[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/11502

    [SPARK-13659] Refactor BlockStore put*() APIs to remove returnValues

    In preparation for larger refactoring, this patch removes the confusing `returnValues` option from the BlockStore put() APIs: returning the value is only useful in one place (caching) and in other situations, such as block replication, it's simpler to put() and then get().
    
    As part of this change, I needed to refactor `BlockManager.doPut()`'s block replication code. I also changed `doPut()` to access the memory and disk stores directly rather than calling them through the BlockStore interface; this is in anticipation of a followup patch to remove the BlockStore interface so that the disk store can expose a binary-data-oriented API which is not concerned with Java objects or serialization.
    
    These changes should be covered by the existing storage unit tests. The best way to review this patch is probably to look at the individual commits, all of which are small and have useful descriptions to guide the review.
    
    /cc @davies for review.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark remove-returnvalues

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11502
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r54973269
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -734,9 +740,9 @@ private[spark] class BlockManager(
        * @param keepReadLock if true, this method will hold the read lock when it returns (even if the
        *                     block already exists). If false, this method will hold no locks when it
        *                     returns.
    -   * @return `Some(PutResult)` if the block did not exist and could not be successfully cached,
    -   *         or None if the block already existed or was successfully stored (fully consuming
    -   *         the input data / input iterator).
    +   * @return `Some(Option[Iterator[Any])` if the block did not exist and could not be successfully
    --- End diff --
    
    Sounds good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192556323
  
    **[Test build #52499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52499/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192486502
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192024300
  
    **[Test build #52419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52419/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192071746
  
    **[Test build #52438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52438/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55115257
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -744,7 +750,7 @@ private[spark] class BlockManager(
           level: StorageLevel,
           tellMaster: Boolean = true,
           effectiveStorageLevel: Option[StorageLevel] = None,
    -      keepReadLock: Boolean = false): Option[PutResult] = {
    +      keepReadLock: Boolean = false): Option[Option[Iterator[Any]]] = {
    --- End diff --
    
    `doPut()` can be called to write either bytes or an iterator of values (since `doPut` is used in the implementation of both `putIterator` and `putBytes`). 
    
    Imagine that `putBytes()` fails because there was not enough space to store the block in memory and it could not be dropped to disk. In this case, the `put()` call has failed but we can't return an iterator.
    
    Thus, we need to make a case class so that we can convey the three separate outcomes (put of bytes failed, put of iterator failed, and put succeeded or block was already present).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192556502
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52499/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11502


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192535571
  
    **[Test build #52489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52489/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192107901
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192107904
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52438/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192567014
  
    Alright, I've updated this to address the review comments. Note, however, that most of the code that I changed in my latest commit is going to be immediately deleted / rewritten in the larger refactoring patch that will follow this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192107709
  
    **[Test build #52438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52438/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192538639
  
    **[Test build #52499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52499/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192055076
  
    **[Test build #52419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52419/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-193618865
  
    LGTM I'm merging this into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55319064
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
    @@ -54,15 +54,12 @@ private[spark] class DiskStore(blockManager: BlockManager, diskManager: DiskBloc
         val finishTime = System.currentTimeMillis
         logDebug("Block %s stored as %s file on disk in %d ms".format(
           file.getName, Utils.bytesToString(bytes.limit), finishTime - startTime))
    -    PutResult(bytes.limit(), Right(bytes.duplicate()))
       }
     
       override def putIterator(
           blockId: BlockId,
           values: Iterator[Any],
    -      level: StorageLevel,
    -      returnValues: Boolean): PutResult = {
    -
    +      level: StorageLevel): Right[Iterator[Any], Long] = {
    --- End diff --
    
    This return type is really wonky. We can fix this later though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192535709
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52489/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192055291
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52419/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192033154
  
    I'm not familiar with BlockManager, the changes looks straightforward to me, only one minor question.
    
    LGTM over all. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192556501
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192464035
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192535707
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55105755
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -801,43 +798,45 @@ private[spark] class BlockManager(
         }
     
         var blockWasSuccessfullyStored = false
    -    var result: PutResult = null
    +    var iteratorFromFailedMemoryStorePut: Option[Iterator[Any]] = None
     
         putBlockInfo.synchronized {
           logTrace("Put for block %s took %s to get into synchronized block"
             .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
           try {
    -        // returnValues - Whether to return the values put
    -        // blockStore - The type of storage to put these values into
    -        val (returnValues, blockStore: BlockStore) = {
    -          if (putLevel.useMemory) {
    -            // Put it in memory first, even if it also has useDisk set to true;
    -            // We will drop it to disk later if the memory store can't hold it.
    -            (true, memoryStore)
    -          } else if (putLevel.useDisk) {
    -            // Don't get back the bytes from put unless we replicate them
    -            (putLevel.replication > 1, diskStore)
    -          } else {
    -            assert(putLevel == StorageLevel.NONE)
    -            throw new BlockException(
    -              blockId, s"Attempted to put block $blockId without specifying storage level!")
    +        if (putLevel.useMemory) {
    +          // Put it in memory first, even if it also has useDisk set to true;
    +          // We will drop it to disk later if the memory store can't hold it.
    +          data match {
    +            case IteratorValues(iterator) =>
    +              memoryStore.putIterator(blockId, iterator(), putLevel) match {
    +                case Right(s) =>
    +                  size = s
    +                case Left(iter) =>
    +                  iteratorFromFailedMemoryStorePut = Some(iter)
    +              }
    +            case ByteBufferValues(bytes) =>
    +              bytes.rewind()
    +              size = bytes.limit()
    +              memoryStore.putBytes(blockId, bytes, putLevel)
               }
    -        }
    -
    -        // Actually put the values
    -        result = data match {
    -          case IteratorValues(iterator) =>
    -            blockStore.putIterator(blockId, iterator(), putLevel, returnValues)
    -          case ByteBufferValues(bytes) =>
    -            bytes.rewind()
    -            blockStore.putBytes(blockId, bytes, putLevel)
    -        }
    -        size = result.size
    -        result.data match {
    -          case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    -          case Right (newBytes) => bytesAfterPut = newBytes
    -          case _ =>
    +        } else if (putLevel.useDisk) {
    +          data match {
    +            case IteratorValues(iterator) =>
    +              diskStore.putIterator(blockId, iterator(), putLevel) match {
    +                case Right(s) =>
    +                  size = s
    --- End diff --
    
    can this never be `Left(iter)`? I think we should still match it to throw a better exception than "match error". If the concern is that we will remove `BlockStore` in a future patch then we should add a TODO here instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r54972357
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -734,9 +740,9 @@ private[spark] class BlockManager(
        * @param keepReadLock if true, this method will hold the read lock when it returns (even if the
        *                     block already exists). If false, this method will hold no locks when it
        *                     returns.
    -   * @return `Some(PutResult)` if the block did not exist and could not be successfully cached,
    -   *         or None if the block already existed or was successfully stored (fully consuming
    -   *         the input data / input iterator).
    +   * @return `Some(Option[Iterator[Any])` if the block did not exist and could not be successfully
    --- End diff --
    
    I haven't done a full review yet but this return type is super confusing...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192464037
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52478/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55107675
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -432,98 +432,104 @@ private[spark] class BlockManager(
             logDebug(s"Block $blockId was not found")
             None
           case Some(info) =>
    -        val level = info.level
    -        logDebug(s"Level for block $blockId is $level")
    -
    -        // Look for the block in memory
    -        if (level.useMemory) {
    -          logDebug(s"Getting block $blockId from memory")
    -          val result = if (asBlockResult) {
    -            memoryStore.getValues(blockId).map { iter =>
    -              val ci = CompletionIterator[Any, Iterator[Any]](iter, releaseLock(blockId))
    -              new BlockResult(ci, DataReadMethod.Memory, info.size)
    -            }
    -          } else {
    -            memoryStore.getBytes(blockId)
    -          }
    -          result match {
    -            case Some(values) =>
    -              return result
    -            case None =>
    -              logDebug(s"Block $blockId not found in memory")
    -          }
    +        doGetLocal(blockId, info, asBlockResult)
    +    }
    +  }
    +
    +  private def doGetLocal(
    --- End diff --
    
    Yep, can do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55103493
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -432,98 +432,104 @@ private[spark] class BlockManager(
             logDebug(s"Block $blockId was not found")
             None
           case Some(info) =>
    -        val level = info.level
    -        logDebug(s"Level for block $blockId is $level")
    -
    -        // Look for the block in memory
    -        if (level.useMemory) {
    -          logDebug(s"Getting block $blockId from memory")
    -          val result = if (asBlockResult) {
    -            memoryStore.getValues(blockId).map { iter =>
    -              val ci = CompletionIterator[Any, Iterator[Any]](iter, releaseLock(blockId))
    -              new BlockResult(ci, DataReadMethod.Memory, info.size)
    -            }
    -          } else {
    -            memoryStore.getBytes(blockId)
    -          }
    -          result match {
    -            case Some(values) =>
    -              return result
    -            case None =>
    -              logDebug(s"Block $blockId not found in memory")
    -          }
    +        doGetLocal(blockId, info, asBlockResult)
    +    }
    +  }
    +
    +  private def doGetLocal(
    --- End diff --
    
    this assumes you already hold the read lock for the block. We should explicitly say so in a comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192538367
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192596066
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192055593
  
    I think this failed an unrelated flaky test in `CommitFailureTestRelationSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192410263
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192598070
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55108284
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -734,9 +740,9 @@ private[spark] class BlockManager(
        * @param keepReadLock if true, this method will hold the read lock when it returns (even if the
        *                     block already exists). If false, this method will hold no locks when it
        *                     returns.
    -   * @return `Some(PutResult)` if the block did not exist and could not be successfully cached,
    -   *         or None if the block already existed or was successfully stored (fully consuming
    -   *         the input data / input iterator).
    +   * @return `Some(Option[Iterator[Any])` if the block did not exist and could not be successfully
    +   *          cached, or None if the block already existed or was successfully stored (fully
    +   *          consuming the input data / input iterator).
    --- End diff --
    
    this paragraph is a little too low level. I would start by explaining what the return value means on a high level, e.g.
    ```
    @return iterator of values if the put failed. This is defined only if ...
    ```
    then go into what `Some` or `None` means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192598071
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52504/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55107665
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -801,43 +798,45 @@ private[spark] class BlockManager(
         }
     
         var blockWasSuccessfullyStored = false
    -    var result: PutResult = null
    +    var iteratorFromFailedMemoryStorePut: Option[Iterator[Any]] = None
     
         putBlockInfo.synchronized {
           logTrace("Put for block %s took %s to get into synchronized block"
             .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
           try {
    -        // returnValues - Whether to return the values put
    -        // blockStore - The type of storage to put these values into
    -        val (returnValues, blockStore: BlockStore) = {
    -          if (putLevel.useMemory) {
    -            // Put it in memory first, even if it also has useDisk set to true;
    -            // We will drop it to disk later if the memory store can't hold it.
    -            (true, memoryStore)
    -          } else if (putLevel.useDisk) {
    -            // Don't get back the bytes from put unless we replicate them
    -            (putLevel.replication > 1, diskStore)
    -          } else {
    -            assert(putLevel == StorageLevel.NONE)
    -            throw new BlockException(
    -              blockId, s"Attempted to put block $blockId without specifying storage level!")
    +        if (putLevel.useMemory) {
    +          // Put it in memory first, even if it also has useDisk set to true;
    +          // We will drop it to disk later if the memory store can't hold it.
    +          data match {
    +            case IteratorValues(iterator) =>
    +              memoryStore.putIterator(blockId, iterator(), putLevel) match {
    +                case Right(s) =>
    +                  size = s
    +                case Left(iter) =>
    +                  iteratorFromFailedMemoryStorePut = Some(iter)
    +              }
    +            case ByteBufferValues(bytes) =>
    +              bytes.rewind()
    +              size = bytes.limit()
    +              memoryStore.putBytes(blockId, bytes, putLevel)
               }
    -        }
    -
    -        // Actually put the values
    -        result = data match {
    -          case IteratorValues(iterator) =>
    -            blockStore.putIterator(blockId, iterator(), putLevel, returnValues)
    -          case ByteBufferValues(bytes) =>
    -            bytes.rewind()
    -            blockStore.putBytes(blockId, bytes, putLevel)
    -        }
    -        size = result.size
    -        result.data match {
    -          case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    -          case Right (newBytes) => bytesAfterPut = newBytes
    -          case _ =>
    +        } else if (putLevel.useDisk) {
    +          data match {
    +            case IteratorValues(iterator) =>
    +              diskStore.putIterator(blockId, iterator(), putLevel) match {
    +                case Right(s) =>
    +                  size = s
    --- End diff --
    
    I didn't bother to comment this because this code is going away in my followup patch; the weird type will no longer be necessary once MemoryStore and DiskStore no longer share a common superclass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r54972851
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -734,9 +740,9 @@ private[spark] class BlockManager(
        * @param keepReadLock if true, this method will hold the read lock when it returns (even if the
        *                     block already exists). If false, this method will hold no locks when it
        *                     returns.
    -   * @return `Some(PutResult)` if the block did not exist and could not be successfully cached,
    -   *         or None if the block already existed or was successfully stored (fully consuming
    -   *         the input data / input iterator).
    +   * @return `Some(Option[Iterator[Any])` if the block did not exist and could not be successfully
    --- End diff --
    
    Basically, the problem is that we need to distinguish between three outcomes: success, failure while storing bytes, and failure while storing an iterator. This is only called in one place internally, so I got lazy and neglected to make a simpler return type to capture this distinction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55107596
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -801,43 +798,45 @@ private[spark] class BlockManager(
         }
     
         var blockWasSuccessfullyStored = false
    -    var result: PutResult = null
    +    var iteratorFromFailedMemoryStorePut: Option[Iterator[Any]] = None
     
         putBlockInfo.synchronized {
           logTrace("Put for block %s took %s to get into synchronized block"
             .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
           try {
    -        // returnValues - Whether to return the values put
    -        // blockStore - The type of storage to put these values into
    -        val (returnValues, blockStore: BlockStore) = {
    -          if (putLevel.useMemory) {
    -            // Put it in memory first, even if it also has useDisk set to true;
    -            // We will drop it to disk later if the memory store can't hold it.
    -            (true, memoryStore)
    -          } else if (putLevel.useDisk) {
    -            // Don't get back the bytes from put unless we replicate them
    -            (putLevel.replication > 1, diskStore)
    -          } else {
    -            assert(putLevel == StorageLevel.NONE)
    -            throw new BlockException(
    -              blockId, s"Attempted to put block $blockId without specifying storage level!")
    +        if (putLevel.useMemory) {
    +          // Put it in memory first, even if it also has useDisk set to true;
    +          // We will drop it to disk later if the memory store can't hold it.
    +          data match {
    +            case IteratorValues(iterator) =>
    +              memoryStore.putIterator(blockId, iterator(), putLevel) match {
    +                case Right(s) =>
    +                  size = s
    +                case Left(iter) =>
    +                  iteratorFromFailedMemoryStorePut = Some(iter)
    +              }
    +            case ByteBufferValues(bytes) =>
    +              bytes.rewind()
    +              size = bytes.limit()
    +              memoryStore.putBytes(blockId, bytes, putLevel)
               }
    -        }
    -
    -        // Actually put the values
    -        result = data match {
    -          case IteratorValues(iterator) =>
    -            blockStore.putIterator(blockId, iterator(), putLevel, returnValues)
    -          case ByteBufferValues(bytes) =>
    -            bytes.rewind()
    -            blockStore.putBytes(blockId, bytes, putLevel)
    -        }
    -        size = result.size
    -        result.data match {
    -          case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    -          case Right (newBytes) => bytesAfterPut = newBytes
    -          case _ =>
    +        } else if (putLevel.useDisk) {
    +          data match {
    +            case IteratorValues(iterator) =>
    +              diskStore.putIterator(blockId, iterator(), putLevel) match {
    +                case Right(s) =>
    +                  size = s
    --- End diff --
    
    Nope, it can't; the return type of `putIterator` is actually `Right` (check the signature). That's why this isn't complaining about a partial match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r54971762
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -744,7 +750,7 @@ private[spark] class BlockManager(
           level: StorageLevel,
           tellMaster: Boolean = true,
           effectiveStorageLevel: Option[StorageLevel] = None,
    -      keepReadLock: Boolean = false): Option[PutResult] = {
    +      keepReadLock: Boolean = false): Option[Option[Iterator[Any]]] = {
    --- End diff --
    
    Do we really need this two level options?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192412645
  
    **[Test build #52478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52478/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55106959
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -868,34 +867,23 @@ private[spark] class BlockManager(
         }
         logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
    -    // Either we're storing bytes and we asynchronously started replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    -      data match {
    -        case ByteBufferValues(bytes) =>
    -          if (replicationFuture != null) {
    -            Await.ready(replicationFuture, Duration.Inf)
    -          }
    -        case _ =>
    -          if (blockWasSuccessfullyStored) {
    -            val remoteStartTime = System.currentTimeMillis
    -            // Serialize the block if not already done
    -            if (bytesAfterPut == null) {
    -              if (valuesAfterPut == null) {
    -                throw new SparkException(
    -                  "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    -              }
    -              bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    -            }
    -            replicate(blockId, bytesAfterPut, putLevel)
    -            logDebug("Put block %s remotely took %s"
    -              .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    -          }
    +    if (replicationFuture != null) {
    +      // Wait for asynchronous replication to finish
    +      Await.ready(replicationFuture, Duration.Inf)
    +    } else if (putLevel.replication > 1 && blockWasSuccessfullyStored) {
    +      val remoteStartTime = System.currentTimeMillis
    +      val bytesToReplicate = doGetLocal(blockId, putBlockInfo, asBlockResult = false).getOrElse {
    +        throw new SparkException(s"Block $blockId was not found even though it was just stored")
    +      }.asInstanceOf[ByteBuffer]
    --- End diff --
    
    style: slightly prefer
    ```
    doGetLocal(...)
      .map(_.asInstanceOf[ByteBuffer])
      .getOrElse { throw new SparkException(...) }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #11502: [SPARK-13659] Refactor BlockStore put*() APIs to ...

Posted by pzz2011 <gi...@git.apache.org>.
Github user pzz2011 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r75572673
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -432,98 +432,105 @@ private[spark] class BlockManager(
             logDebug(s"Block $blockId was not found")
             None
           case Some(info) =>
    -        val level = info.level
    -        logDebug(s"Level for block $blockId is $level")
    -
    -        // Look for the block in memory
    -        if (level.useMemory) {
    -          logDebug(s"Getting block $blockId from memory")
    -          val result = if (asBlockResult) {
    -            memoryStore.getValues(blockId).map { iter =>
    -              val ci = CompletionIterator[Any, Iterator[Any]](iter, releaseLock(blockId))
    -              new BlockResult(ci, DataReadMethod.Memory, info.size)
    -            }
    -          } else {
    -            memoryStore.getBytes(blockId)
    -          }
    -          result match {
    -            case Some(values) =>
    -              return result
    -            case None =>
    -              logDebug(s"Block $blockId not found in memory")
    -          }
    +        doGetLocal(blockId, info, asBlockResult)
    +    }
    +  }
    +
    +  private def doGetLocal(
    +      blockId: BlockId,
    +      info: BlockInfo,
    +      asBlockResult: Boolean): Option[Any] = {
    +    val level = info.level
    +    logDebug(s"Level for block $blockId is $level")
    +
    +    // Look for the block in memory
    +    if (level.useMemory) {
    +      logDebug(s"Getting block $blockId from memory")
    +      val result = if (asBlockResult) {
    --- End diff --
    
    Hey? excuse me, @JoshRosen  can I ask you a question(may be a little stupid...)? what's the `asBlockResult` mean? why here should use it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192567237
  
    **[Test build #52504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52504/consoleFull)** for PR 11502 at commit [`910f997`](https://github.com/apache/spark/commit/910f9977e5537410da994a0e93096dd7e1e8dbfe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192595554
  
    **[Test build #52503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52503/consoleFull)** for PR 11502 at commit [`3d8f479`](https://github.com/apache/spark/commit/3d8f47931915e39feea45f4be30280bb7b472a4b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192487388
  
    **[Test build #52489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52489/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192596069
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52503/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192597678
  
    **[Test build #52504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52504/consoleFull)** for PR 11502 at commit [`910f997`](https://github.com/apache/spark/commit/910f9977e5537410da994a0e93096dd7e1e8dbfe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192567042
  
    **[Test build #52503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52503/consoleFull)** for PR 11502 at commit [`3d8f479`](https://github.com/apache/spark/commit/3d8f47931915e39feea45f4be30280bb7b472a4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192055287
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192070455
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r54972981
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -734,9 +740,9 @@ private[spark] class BlockManager(
        * @param keepReadLock if true, this method will hold the read lock when it returns (even if the
        *                     block already exists). If false, this method will hold no locks when it
        *                     returns.
    -   * @return `Some(PutResult)` if the block did not exist and could not be successfully cached,
    -   *         or None if the block already existed or was successfully stored (fully consuming
    -   *         the input data / input iterator).
    +   * @return `Some(Option[Iterator[Any])` if the block did not exist and could not be successfully
    --- End diff --
    
    This will be much easier to clean up if we split the monolithic `doPut()` into a `doPutBytes()` and `doPutIterator()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11502#issuecomment-192463827
  
    **[Test build #52478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52478/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11502#discussion_r55108022
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -744,7 +750,7 @@ private[spark] class BlockManager(
           level: StorageLevel,
           tellMaster: Boolean = true,
           effectiveStorageLevel: Option[StorageLevel] = None,
    -      keepReadLock: Boolean = false): Option[PutResult] = {
    +      keepReadLock: Boolean = false): Option[Option[Iterator[Any]]] = {
    --- End diff --
    
    I checked the callsites. Currently we never actually use the inner level of option. The only place where we look at the `Some` value is in `getOrElseUpdate`. There we just do `x.get.get` anyway, so this can just be an `Option[Iterator[Any]]`.
    
    If there's another boolean that we need to return, then I would just make a private case class even if it's called in only 1 place. Right now it's unclear what `Some(None)` means, for example. Anything we can think of here is better than nested options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org