You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by squito <gi...@git.apache.org> on 2015/04/11 03:19:22 UTC

[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize

GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/5463

    [Spark-6839] BlockManger.dataDeserialize 

    https://issues.apache.org/jira/browse/SPARK-6839
    
    This needed to touch a surprisingly large amount of code to make sure that `BlockManager.dataDeserialize` always gets passed something which can ensure the input stream gets closed.  I trimmed out what I could, but almost all paths through `BlockManager` might end up calling `dataDeserialize` (even when blocks are being `put`).
    
    There isn't always a `TaskContext` at all of the relevant call sites, so I made a new abstraction `ResourceCleaner`.  `TaskContext` extends `ResourceCleaner`, so we use `TaskContext` where we have one; otherwise there is a `SimpleResourceCleaner` that just keeps a list of functions to run in a `finally` block.
    
    I also considered forcing `DeserializationStream.asIterator` to need  a `ResourceCleaner`.  That way we'd force the *right* cleaning function to be used everytime somebody called `stream.asIterator`.  However, I figure it is *possible* that you might read some of the stream and not necessarily want close it.  I'm curious for the opinion of others on this.
    
    `BroadcastSuite` is somewhat flaky when run on my laptop ... I'm pretty sure that is not related to these changes, but I guess we'll see what jenkins says.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark SPARK-6839_dispose_bytebuffers

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5463
    
----
commit 345803bf316d214fbc027a3947ec4bdbb9e5ce0e
Author: Imran Rashid <ir...@cloudera.com>
Date:   2015-04-10T14:45:39Z

    change ByteBufferInputStream to do dispose in close(), rather than at end of stream

commit 5e7214ff5dcf4379137512e4fda8c29dd19bf40e
Author: Imran Rashid <ir...@cloudera.com>
Date:   2015-04-10T14:47:07Z

    add test for DeserializationStream (passed w/out changes)

commit 2053a15b5200ddf485dd6124766b892e085630f1
Author: Imran Rashid <ir...@cloudera.com>
Date:   2015-04-10T19:59:30Z

    every call to BlockManager.dataDeserialize requires a ResourceCleaner to ensure the stream gets closed

commit 32af41893d73dc6d0b490b5db930dc5d1014e4bd
Author: Imran Rashid <ir...@cloudera.com>
Date:   2015-04-10T20:23:33Z

    add test

commit 33c8be9217bf3137e4a1e65e6d8b14af9324fd6c
Author: Imran Rashid <ir...@cloudera.com>
Date:   2015-04-10T21:10:09Z

    rename

commit 76bf6f2cbd253b79a285c62c488abfa7fed43a09
Author: Imran Rashid <ir...@cloudera.com>
Date:   2015-04-10T21:33:49Z

    style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93576521
  
      [Test build #30377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30377/consoleFull) for   PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-91737691
  
      [Test build #30064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30064/consoleFull) for   PR 5463 at commit [`e225ea7`](https://github.com/apache/spark/commit/e225ea7c01f72fbda3cfe2bdb9f3c92cb7f6b246).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93535270
  
      [Test build #30368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30368/consoleFull) for   PR 5463 at commit [`3a85734`](https://github.com/apache/spark/commit/3a857345431fe7140057d621431096b7be493391).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92454126
  
      [Test build #30177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30177/consoleFull) for   PR 5463 at commit [`cd05247`](https://github.com/apache/spark/commit/cd05247be944d798b20b7ccf0c8951272d7d8e7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-94077313
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30495/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92445803
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30169/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93803696
  
      [Test build #30430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30430/consoleFull) for   PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93522789
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30351/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92179785
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30130/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92452706
  
      [Test build #30175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30175/consoleFull) for   PR 5463 at commit [`3241d66`](https://github.com/apache/spark/commit/3241d66c8df894d20407756a88960a92dbb1d962).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92186450
  
      [Test build #30134 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30134/consoleFull) for   PR 5463 at commit [`15106ab`](https://github.com/apache/spark/commit/15106ab96806482276f3aec2480c9e0f41a359ed).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
      * `trait ResourceCleaner `
      * `class SimpleResourceCleaner extends ResourceCleaner `
    
     * This patch **removes the following dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.1.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92490886
  
      [Test build #30175 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30175/consoleFull) for   PR 5463 at commit [`3241d66`](https://github.com/apache/spark/commit/3241d66c8df894d20407756a88960a92dbb1d962).
     * This patch **passes all tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-94052759
  
      [Test build #30495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30495/consoleFull) for   PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92179758
  
      [Test build #30130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30130/consoleFull) for   PR 5463 at commit [`29a22a9`](https://github.com/apache/spark/commit/29a22a97dc98b3c7773f249f7f4da736121cb802).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
      * `trait ResourceCleaner `
      * `class SimpleResourceCleaner extends ResourceCleaner `
    
     * This patch **removes the following dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.1.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93624294
  
    **[Test build #30388 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30388/consoleFull)**     for PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7)     after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-94077306
  
      [Test build #30495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30495/consoleFull) for   PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
      * `case class UnresolvedAttribute(nameParts: Seq[String])`
      * `trait CaseConversionExpression `
      * `final class UTF8String extends Ordered[UTF8String] with Serializable `
      * `case class Exchange(`
      * `case class SortMergeJoin(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5463#discussion_r28190632
  
    --- Diff: core/src/test/scala/org/apache/spark/serializer/SerializerSuite.scala ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.serializer
    +
    +import java.io.EOFException
    +
    +import scala.reflect.ClassTag
    +
    +import org.scalatest.{Matchers, FunSuite}
    +
    +class SerializerSuite extends FunSuite with Matchers {
    +  test("DeserializationStream closes input at end") {
    +    val in = new IntDeserializationStream(2)
    --- End diff --
    
    this test passed before any changes were made.  I just wanted to double check this for my own sake, and figured I might as well include it an extra tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93519270
  
      [Test build #30350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30350/consoleFull) for   PR 5463 at commit [`fd90a7a`](https://github.com/apache/spark/commit/fd90a7a588c2a9d889420a414af00df4f1234d13).
     * This patch **passes all tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
    
     * This patch **adds the following new dependencies:**
       * `commons-math3-3.1.1.jar`
       * `snappy-java-1.1.1.6.jar`
    
     * This patch **removes the following dependencies:**
       * `commons-math3-3.4.1.jar`
       * `snappy-java-1.1.1.7.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-91738272
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30064/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-91736218
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30063/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92491779
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30180/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-91738270
  
      [Test build #30064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30064/consoleFull) for   PR 5463 at commit [`e225ea7`](https://github.com/apache/spark/commit/e225ea7c01f72fbda3cfe2bdb9f3c92cb7f6b246).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
      * `trait ResourceCleaner `
      * `class SimpleResourceCleaner extends ResourceCleaner `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93598549
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30377/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92490912
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30175/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92183786
  
      [Test build #30136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30136/consoleFull) for   PR 5463 at commit [`e632315`](https://github.com/apache/spark/commit/e6323150374bca412a2b32a46d9443eaed42c606).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93803024
  
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93827537
  
    **[Test build #30430 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30430/consoleFull)**     for PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7)     after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93522744
  
    **[Test build #30351 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30351/consoleFull)**     for PR 5463 at commit [`3a85734`](https://github.com/apache/spark/commit/3a857345431fe7140057d621431096b7be493391)     after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5463#discussion_r28434599
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -755,104 +769,115 @@ private[spark] class BlockManager(
           case _ => null
         }
     
    -    putBlockInfo.synchronized {
    -      logTrace("Put for block %s took %s to get into synchronized block"
    -        .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    -
    -      var marked = false
    -      try {
    -        // returnValues - Whether to return the values put
    -        // blockStore - The type of storage to put these values into
    -        val (returnValues, blockStore: BlockStore) = {
    -          if (putLevel.useMemory) {
    -            // Put it in memory first, even if it also has useDisk set to true;
    -            // We will drop it to disk later if the memory store can't hold it.
    -            (true, memoryStore)
    -          } else if (putLevel.useOffHeap) {
    -            // Use tachyon for off-heap storage
    -            (false, tachyonStore)
    -          } else if (putLevel.useDisk) {
    -            // Don't get back the bytes from put unless we replicate them
    -            (putLevel.replication > 1, diskStore)
    -          } else {
    -            assert(putLevel == StorageLevel.NONE)
    -            throw new BlockException(
    -              blockId, s"Attempted to put block $blockId without specifying storage level!")
    +    try {
    +      putBlockInfo.synchronized {
    +        logTrace("Put for block %s took %s to get into synchronized block"
    +          .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +
    +        var marked = false
    +        try {
    +          // returnValues - Whether to return the values put
    +          // blockStore - The type of storage to put these values into
    +          val (returnValues, blockStore: BlockStore) = {
    +            if (putLevel.useMemory) {
    +              // Put it in memory first, even if it also has useDisk set to true;
    +              // We will drop it to disk later if the memory store can't hold it.
    +              (true, memoryStore)
    +            } else if (putLevel.useOffHeap) {
    +              // Use tachyon for off-heap storage
    +              (false, tachyonStore)
    +            } else if (putLevel.useDisk) {
    +              // Don't get back the bytes from put unless we replicate them
    +              (putLevel.replication > 1, diskStore)
    +            } else {
    +              assert(putLevel == StorageLevel.NONE)
    +              throw new BlockException(
    +                blockId, s"Attempted to put block $blockId without specifying storage level!")
    +            }
               }
    -        }
     
    -        // Actually put the values
    -        val result = data match {
    -          case IteratorValues(iterator) =>
    -            blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    -          case ArrayValues(array) =>
    -            blockStore.putArray(blockId, array, putLevel, returnValues)
    -          case ByteBufferValues(bytes) =>
    -            bytes.rewind()
    -            blockStore.putBytes(blockId, bytes, putLevel)
    -        }
    -        size = result.size
    -        result.data match {
    -          case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    -          case Right (newBytes) => bytesAfterPut = newBytes
    -          case _ =>
    -        }
    +          // Actually put the values
    +          val result = data match {
    +            case IteratorValues(iterator) =>
    +              blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    +            case ArrayValues(array) =>
    +              blockStore.putArray(blockId, array, putLevel, returnValues)
    +            case ByteBufferValues(bytes) =>
    +              bytes.rewind()
    +              blockStore.putBytes(blockId, bytes, putLevel, resourceCleaner)
    +          }
    +          size = result.size
    +          result.data match {
    +            case Left(newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    +            case Right(newBytes) => bytesAfterPut = newBytes
    +            case _ =>
    +          }
     
    -        // Keep track of which blocks are dropped from memory
    -        if (putLevel.useMemory) {
    -          result.droppedBlocks.foreach { updatedBlocks += _ }
    -        }
    +          // Keep track of which blocks are dropped from memory
    +          if (putLevel.useMemory) {
    +            result.droppedBlocks.foreach {
    +              updatedBlocks += _
    +            }
    +          }
     
    -        val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    -        if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    -          // Now that the block is in either the memory, tachyon, or disk store,
    -          // let other threads read it, and tell the master about it.
    -          marked = true
    -          putBlockInfo.markReady(size)
    -          if (tellMaster) {
    -            reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +          val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    +          if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    +            // Now that the block is in either the memory, tachyon, or disk store,
    +            // let other threads read it, and tell the master about it.
    +            marked = true
    +            putBlockInfo.markReady(size)
    +            if (tellMaster) {
    +              reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +            }
    +            updatedBlocks += ((blockId, putBlockStatus))
    +          }
    +        } finally {
    +          // If we failed in putting the block to memory/disk, notify other possible readers
    +          // that it has failed, and then remove it from the block info map.
    +          if (!marked) {
    +            // Note that the remove must happen before markFailure otherwise another thread
    +            // could've inserted a new BlockInfo before we remove it.
    +            blockInfo.remove(blockId)
    +            putBlockInfo.markFailure()
    +            logWarning(s"Putting block $blockId failed")
               }
    -          updatedBlocks += ((blockId, putBlockStatus))
    -        }
    -      } finally {
    -        // If we failed in putting the block to memory/disk, notify other possible readers
    -        // that it has failed, and then remove it from the block info map.
    -        if (!marked) {
    -          // Note that the remove must happen before markFailure otherwise another thread
    -          // could've inserted a new BlockInfo before we remove it.
    -          blockInfo.remove(blockId)
    -          putBlockInfo.markFailure()
    -          logWarning(s"Putting block $blockId failed")
             }
           }
    -    }
    -    logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +      logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
    -    // Either we're storing bytes and we asynchronously started replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    -      data match {
    -        case ByteBufferValues(bytes) =>
    -          if (replicationFuture != null) {
    -            Await.ready(replicationFuture, Duration.Inf)
    -          }
    -        case _ =>
    -          val remoteStartTime = System.currentTimeMillis
    -          // Serialize the block if not already done
    -          if (bytesAfterPut == null) {
    -            if (valuesAfterPut == null) {
    -              throw new SparkException(
    -                "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +      // Either we're storing bytes and we asynchronously started replication, or we're storing
    +      // values and need to serialize and replicate them now:
    +      if (putLevel.replication > 1) {
    +        data match {
    +          case ByteBufferValues(bytes) =>
    +            if (replicationFuture != null) {
    +              Await.ready(replicationFuture, Duration.Inf)
                 }
    -            bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    -          }
    -          replicate(blockId, bytesAfterPut, putLevel)
    -          logDebug("Put block %s remotely took %s"
    -            .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +          case _ =>
    +            val remoteStartTime = System.currentTimeMillis
    +            // Serialize the block if not already done
    +            if (bytesAfterPut == null) {
    +              if (valuesAfterPut == null) {
    +                throw new SparkException(
    +                  "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +              }
    +              bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    +            }
    +            replicate(blockId, bytesAfterPut, putLevel)
    +            logDebug("Put block %s remotely took %s"
    +              .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +        }
           }
    -    }
     
    -    BlockManager.dispose(bytesAfterPut)
    +    } finally {
    +      BlockManager.dispose(bytesAfterPut)
    +      // this is to clean up the byte buffer from the *input* (as opposed to the line above, which
    +      // disposes the byte buffer that is the *result* of the put).  We might have turned that byte
    +      // buffer into an iterator of values, in which case the input ByteBuffer should be disposed.
    +      // It will automatically get disposed when we get to the end of the iterator, but we might
    +      // never even try to get to the end, or there might be an exception along the way
    +      resourceCleaner.doCleanup()
    --- End diff --
    
    repeating my previous comment, since it still applies despite the changed diff:
    
    this is probably the only case that is really unusual, and worth getting somebody else to look at. A call to `BlockManager.doPut` might try to deserialize a ByteBuffer its given (if storageLevel is memory deserialized), and then it might read to the end of that Iterator (if there is enough memory to read to the end of the iterator, or if the storage levels allows dropping to disk). If both of those are true, the old code would dispose the input byte buffer. If not, then it never would.
    
    So the issue here is not so much that we're worried about exceptions from user code (since the iterator is never exposed to the user), but just being consistent on whether or not we dispose the input byte buffer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93854101
  
    anybody have any idea what is going on here?  the tests all run fine on my laptop.  Its slow, but not so much worse than before.  The logs seem to indicate everything is fine until it times out.
    
    in the meantime, try try again ... jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92454499
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30177/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5463#discussion_r28278804
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -755,104 +769,115 @@ private[spark] class BlockManager(
           case _ => null
         }
     
    -    putBlockInfo.synchronized {
    -      logTrace("Put for block %s took %s to get into synchronized block"
    -        .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    -
    -      var marked = false
    -      try {
    -        // returnValues - Whether to return the values put
    -        // blockStore - The type of storage to put these values into
    -        val (returnValues, blockStore: BlockStore) = {
    -          if (putLevel.useMemory) {
    -            // Put it in memory first, even if it also has useDisk set to true;
    -            // We will drop it to disk later if the memory store can't hold it.
    -            (true, memoryStore)
    -          } else if (putLevel.useOffHeap) {
    -            // Use tachyon for off-heap storage
    -            (false, tachyonStore)
    -          } else if (putLevel.useDisk) {
    -            // Don't get back the bytes from put unless we replicate them
    -            (putLevel.replication > 1, diskStore)
    -          } else {
    -            assert(putLevel == StorageLevel.NONE)
    -            throw new BlockException(
    -              blockId, s"Attempted to put block $blockId without specifying storage level!")
    +    try {
    +      putBlockInfo.synchronized {
    +        logTrace("Put for block %s took %s to get into synchronized block"
    +          .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +
    +        var marked = false
    +        try {
    +          // returnValues - Whether to return the values put
    +          // blockStore - The type of storage to put these values into
    +          val (returnValues, blockStore: BlockStore) = {
    +            if (putLevel.useMemory) {
    +              // Put it in memory first, even if it also has useDisk set to true;
    +              // We will drop it to disk later if the memory store can't hold it.
    +              (true, memoryStore)
    +            } else if (putLevel.useOffHeap) {
    +              // Use tachyon for off-heap storage
    +              (false, tachyonStore)
    +            } else if (putLevel.useDisk) {
    +              // Don't get back the bytes from put unless we replicate them
    +              (putLevel.replication > 1, diskStore)
    +            } else {
    +              assert(putLevel == StorageLevel.NONE)
    +              throw new BlockException(
    +                blockId, s"Attempted to put block $blockId without specifying storage level!")
    +            }
               }
    -        }
     
    -        // Actually put the values
    -        val result = data match {
    -          case IteratorValues(iterator) =>
    -            blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    -          case ArrayValues(array) =>
    -            blockStore.putArray(blockId, array, putLevel, returnValues)
    -          case ByteBufferValues(bytes) =>
    -            bytes.rewind()
    -            blockStore.putBytes(blockId, bytes, putLevel)
    -        }
    -        size = result.size
    -        result.data match {
    -          case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    -          case Right (newBytes) => bytesAfterPut = newBytes
    -          case _ =>
    -        }
    +          // Actually put the values
    +          val result = data match {
    +            case IteratorValues(iterator) =>
    +              blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    +            case ArrayValues(array) =>
    +              blockStore.putArray(blockId, array, putLevel, returnValues)
    +            case ByteBufferValues(bytes) =>
    +              bytes.rewind()
    +              blockStore.putBytes(blockId, bytes, putLevel, resourceCleaner)
    +          }
    +          size = result.size
    +          result.data match {
    +            case Left(newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    +            case Right(newBytes) => bytesAfterPut = newBytes
    +            case _ =>
    +          }
     
    -        // Keep track of which blocks are dropped from memory
    -        if (putLevel.useMemory) {
    -          result.droppedBlocks.foreach { updatedBlocks += _ }
    -        }
    +          // Keep track of which blocks are dropped from memory
    +          if (putLevel.useMemory) {
    +            result.droppedBlocks.foreach {
    +              updatedBlocks += _
    +            }
    +          }
     
    -        val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    -        if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    -          // Now that the block is in either the memory, tachyon, or disk store,
    -          // let other threads read it, and tell the master about it.
    -          marked = true
    -          putBlockInfo.markReady(size)
    -          if (tellMaster) {
    -            reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +          val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    +          if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    +            // Now that the block is in either the memory, tachyon, or disk store,
    +            // let other threads read it, and tell the master about it.
    +            marked = true
    +            putBlockInfo.markReady(size)
    +            if (tellMaster) {
    +              reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +            }
    +            updatedBlocks += ((blockId, putBlockStatus))
    +          }
    +        } finally {
    +          // If we failed in putting the block to memory/disk, notify other possible readers
    +          // that it has failed, and then remove it from the block info map.
    +          if (!marked) {
    +            // Note that the remove must happen before markFailure otherwise another thread
    +            // could've inserted a new BlockInfo before we remove it.
    +            blockInfo.remove(blockId)
    +            putBlockInfo.markFailure()
    +            logWarning(s"Putting block $blockId failed")
               }
    -          updatedBlocks += ((blockId, putBlockStatus))
    -        }
    -      } finally {
    -        // If we failed in putting the block to memory/disk, notify other possible readers
    -        // that it has failed, and then remove it from the block info map.
    -        if (!marked) {
    -          // Note that the remove must happen before markFailure otherwise another thread
    -          // could've inserted a new BlockInfo before we remove it.
    -          blockInfo.remove(blockId)
    -          putBlockInfo.markFailure()
    -          logWarning(s"Putting block $blockId failed")
             }
           }
    -    }
    -    logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +      logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
    -    // Either we're storing bytes and we asynchronously started replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    -      data match {
    -        case ByteBufferValues(bytes) =>
    -          if (replicationFuture != null) {
    -            Await.ready(replicationFuture, Duration.Inf)
    -          }
    -        case _ =>
    -          val remoteStartTime = System.currentTimeMillis
    -          // Serialize the block if not already done
    -          if (bytesAfterPut == null) {
    -            if (valuesAfterPut == null) {
    -              throw new SparkException(
    -                "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +      // Either we're storing bytes and we asynchronously started replication, or we're storing
    +      // values and need to serialize and replicate them now:
    +      if (putLevel.replication > 1) {
    +        data match {
    +          case ByteBufferValues(bytes) =>
    +            if (replicationFuture != null) {
    +              Await.ready(replicationFuture, Duration.Inf)
                 }
    -            bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    -          }
    -          replicate(blockId, bytesAfterPut, putLevel)
    -          logDebug("Put block %s remotely took %s"
    -            .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +          case _ =>
    +            val remoteStartTime = System.currentTimeMillis
    +            // Serialize the block if not already done
    +            if (bytesAfterPut == null) {
    +              if (valuesAfterPut == null) {
    +                throw new SparkException(
    +                  "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +              }
    +              bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    +            }
    +            replicate(blockId, bytesAfterPut, putLevel)
    +            logDebug("Put block %s remotely took %s"
    +              .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +        }
           }
    -    }
     
    -    BlockManager.dispose(bytesAfterPut)
    +      BlockManager.dispose(bytesAfterPut)
    +    } finally {
    +      // this is to clean up the byte buffer from the *input* (as opposed to the line above, which
    +      // disposes the byte buffer that is the *result* of the put).  We might have turned that byte
    +      // buffer into an iterator of values, in which case the input ByteBuffer should be disposed.
    +      // It will automatically get disposed when we get to the end of the iterator, but in case
    +      // there is some exception before we do, this will take care of it
    +      resourceCleaner.doCleanup()
    --- End diff --
    
    this is probably the only case that is really unusual, and worth getting somebody else to look at.  A call to `BlockManager.doPut` *might* try to deserialize a ByteBuffer its given (if storageLevel is memory deserialized), and then it *might* read to the end of that Iterator (if there is enough memory to read to the end of the iterator, *or* if the storage levels allows dropping to disk).  If both of those are true, the old code would `dispose` the input byte buffer.  If not, then it never would.
    
    So the issue here is not so much that we're worried about exceptions from user code (since the iterator is never exposed to the user), but just being consistent on whether or not we dispose the input byte buffer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92512403
  
    I talked to @squito offline and it doesn't seem like there's a less intrusive solution to this issue, so this looks OK to me. But I'm not that familiar with these code paths, so let's wait for more eyes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92169356
  
      [Test build #30130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30130/consoleFull) for   PR 5463 at commit [`29a22a9`](https://github.com/apache/spark/commit/29a22a97dc98b3c7773f249f7f4da736121cb802).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93461723
  
      [Test build #30350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30350/consoleFull) for   PR 5463 at commit [`fd90a7a`](https://github.com/apache/spark/commit/fd90a7a588c2a9d889420a414af00df4f1234d13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92454492
  
      [Test build #30177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30177/consoleFull) for   PR 5463 at commit [`cd05247`](https://github.com/apache/spark/commit/cd05247be944d798b20b7ccf0c8951272d7d8e7e).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92445769
  
      [Test build #30169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30169/consoleFull) for   PR 5463 at commit [`7797c69`](https://github.com/apache/spark/commit/7797c69352d89422037cb96dafc3e8f88ad44699).
     * This patch **passes all tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
      * `trait ResourceCleaner `
      * `class SimpleResourceCleaner extends ResourceCleaner `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92173500
  
      [Test build #30134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30134/consoleFull) for   PR 5463 at commit [`15106ab`](https://github.com/apache/spark/commit/15106ab96806482276f3aec2480c9e0f41a359ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93533480
  
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-91736215
  
      [Test build #30063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30063/consoleFull) for   PR 5463 at commit [`76bf6f2`](https://github.com/apache/spark/commit/76bf6f2cbd253b79a285c62c488abfa7fed43a09).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
      * `trait ResourceCleaner `
      * `class SimpleResourceCleaner extends ResourceCleaner `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92491757
  
      [Test build #30180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30180/consoleFull) for   PR 5463 at commit [`adae1e3`](https://github.com/apache/spark/commit/adae1e3d08540fbb31dcd9f883914b011c4191a9).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93608902
  
      [Test build #30388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30388/consoleFull) for   PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93573359
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30368/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5463#discussion_r28283453
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockStore.scala ---
    @@ -22,13 +22,18 @@ import java.nio.ByteBuffer
     import scala.collection.mutable.ArrayBuffer
     
     import org.apache.spark.Logging
    +import org.apache.spark.util.ResourceCleaner
     
     /**
      * Abstract class to store blocks.
      */
     private[spark] abstract class BlockStore(val blockManager: BlockManager) extends Logging {
     
    -  def putBytes(blockId: BlockId, bytes: ByteBuffer, level: StorageLevel): PutResult
    +  def putBytes(
    +    blockId: BlockId,
    --- End diff --
    
    nit: these arguments should be double-indented


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93608332
  
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5463#discussion_r28278835
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -755,104 +769,115 @@ private[spark] class BlockManager(
           case _ => null
         }
     
    -    putBlockInfo.synchronized {
    -      logTrace("Put for block %s took %s to get into synchronized block"
    -        .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    -
    -      var marked = false
    -      try {
    -        // returnValues - Whether to return the values put
    -        // blockStore - The type of storage to put these values into
    -        val (returnValues, blockStore: BlockStore) = {
    -          if (putLevel.useMemory) {
    -            // Put it in memory first, even if it also has useDisk set to true;
    -            // We will drop it to disk later if the memory store can't hold it.
    -            (true, memoryStore)
    -          } else if (putLevel.useOffHeap) {
    -            // Use tachyon for off-heap storage
    -            (false, tachyonStore)
    -          } else if (putLevel.useDisk) {
    -            // Don't get back the bytes from put unless we replicate them
    -            (putLevel.replication > 1, diskStore)
    -          } else {
    -            assert(putLevel == StorageLevel.NONE)
    -            throw new BlockException(
    -              blockId, s"Attempted to put block $blockId without specifying storage level!")
    +    try {
    +      putBlockInfo.synchronized {
    +        logTrace("Put for block %s took %s to get into synchronized block"
    +          .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +
    +        var marked = false
    +        try {
    +          // returnValues - Whether to return the values put
    +          // blockStore - The type of storage to put these values into
    +          val (returnValues, blockStore: BlockStore) = {
    +            if (putLevel.useMemory) {
    +              // Put it in memory first, even if it also has useDisk set to true;
    +              // We will drop it to disk later if the memory store can't hold it.
    +              (true, memoryStore)
    +            } else if (putLevel.useOffHeap) {
    +              // Use tachyon for off-heap storage
    +              (false, tachyonStore)
    +            } else if (putLevel.useDisk) {
    +              // Don't get back the bytes from put unless we replicate them
    +              (putLevel.replication > 1, diskStore)
    +            } else {
    +              assert(putLevel == StorageLevel.NONE)
    +              throw new BlockException(
    +                blockId, s"Attempted to put block $blockId without specifying storage level!")
    +            }
               }
    -        }
     
    -        // Actually put the values
    -        val result = data match {
    -          case IteratorValues(iterator) =>
    -            blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    -          case ArrayValues(array) =>
    -            blockStore.putArray(blockId, array, putLevel, returnValues)
    -          case ByteBufferValues(bytes) =>
    -            bytes.rewind()
    -            blockStore.putBytes(blockId, bytes, putLevel)
    -        }
    -        size = result.size
    -        result.data match {
    -          case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    -          case Right (newBytes) => bytesAfterPut = newBytes
    -          case _ =>
    -        }
    +          // Actually put the values
    +          val result = data match {
    +            case IteratorValues(iterator) =>
    +              blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    +            case ArrayValues(array) =>
    +              blockStore.putArray(blockId, array, putLevel, returnValues)
    +            case ByteBufferValues(bytes) =>
    +              bytes.rewind()
    +              blockStore.putBytes(blockId, bytes, putLevel, resourceCleaner)
    +          }
    +          size = result.size
    +          result.data match {
    +            case Left(newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    +            case Right(newBytes) => bytesAfterPut = newBytes
    +            case _ =>
    +          }
     
    -        // Keep track of which blocks are dropped from memory
    -        if (putLevel.useMemory) {
    -          result.droppedBlocks.foreach { updatedBlocks += _ }
    -        }
    +          // Keep track of which blocks are dropped from memory
    +          if (putLevel.useMemory) {
    +            result.droppedBlocks.foreach {
    +              updatedBlocks += _
    +            }
    +          }
     
    -        val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    -        if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    -          // Now that the block is in either the memory, tachyon, or disk store,
    -          // let other threads read it, and tell the master about it.
    -          marked = true
    -          putBlockInfo.markReady(size)
    -          if (tellMaster) {
    -            reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +          val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    +          if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    +            // Now that the block is in either the memory, tachyon, or disk store,
    +            // let other threads read it, and tell the master about it.
    +            marked = true
    +            putBlockInfo.markReady(size)
    +            if (tellMaster) {
    +              reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +            }
    +            updatedBlocks += ((blockId, putBlockStatus))
    +          }
    +        } finally {
    +          // If we failed in putting the block to memory/disk, notify other possible readers
    +          // that it has failed, and then remove it from the block info map.
    +          if (!marked) {
    +            // Note that the remove must happen before markFailure otherwise another thread
    +            // could've inserted a new BlockInfo before we remove it.
    +            blockInfo.remove(blockId)
    +            putBlockInfo.markFailure()
    +            logWarning(s"Putting block $blockId failed")
               }
    -          updatedBlocks += ((blockId, putBlockStatus))
    -        }
    -      } finally {
    -        // If we failed in putting the block to memory/disk, notify other possible readers
    -        // that it has failed, and then remove it from the block info map.
    -        if (!marked) {
    -          // Note that the remove must happen before markFailure otherwise another thread
    -          // could've inserted a new BlockInfo before we remove it.
    -          blockInfo.remove(blockId)
    -          putBlockInfo.markFailure()
    -          logWarning(s"Putting block $blockId failed")
             }
           }
    -    }
    -    logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +      logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
    -    // Either we're storing bytes and we asynchronously started replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    -      data match {
    -        case ByteBufferValues(bytes) =>
    -          if (replicationFuture != null) {
    -            Await.ready(replicationFuture, Duration.Inf)
    -          }
    -        case _ =>
    -          val remoteStartTime = System.currentTimeMillis
    -          // Serialize the block if not already done
    -          if (bytesAfterPut == null) {
    -            if (valuesAfterPut == null) {
    -              throw new SparkException(
    -                "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +      // Either we're storing bytes and we asynchronously started replication, or we're storing
    +      // values and need to serialize and replicate them now:
    +      if (putLevel.replication > 1) {
    +        data match {
    +          case ByteBufferValues(bytes) =>
    +            if (replicationFuture != null) {
    +              Await.ready(replicationFuture, Duration.Inf)
                 }
    -            bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    -          }
    -          replicate(blockId, bytesAfterPut, putLevel)
    -          logDebug("Put block %s remotely took %s"
    -            .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +          case _ =>
    +            val remoteStartTime = System.currentTimeMillis
    +            // Serialize the block if not already done
    +            if (bytesAfterPut == null) {
    +              if (valuesAfterPut == null) {
    +                throw new SparkException(
    +                  "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +              }
    +              bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    +            }
    +            replicate(blockId, bytesAfterPut, putLevel)
    +            logDebug("Put block %s remotely took %s"
    +              .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +        }
           }
    -    }
     
    -    BlockManager.dispose(bytesAfterPut)
    +      BlockManager.dispose(bytesAfterPut)
    +    } finally {
    +      // this is to clean up the byte buffer from the *input* (as opposed to the line above, which
    +      // disposes the byte buffer that is the *result* of the put).  We might have turned that byte
    +      // buffer into an iterator of values, in which case the input ByteBuffer should be disposed.
    +      // It will automatically get disposed when we get to the end of the iterator, but in case
    +      // there is some exception before we do, this will take care of it
    +      resourceCleaner.doCleanup()
    --- End diff --
    
    and while I'm looking here ... should the `BlockManager.dispose(bytesAfterPut)` also be a in a `finally`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93519347
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30350/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by squito <gi...@git.apache.org>.
Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-94051965
  
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5463#discussion_r28283431
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -755,104 +769,115 @@ private[spark] class BlockManager(
           case _ => null
         }
     
    -    putBlockInfo.synchronized {
    -      logTrace("Put for block %s took %s to get into synchronized block"
    -        .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    -
    -      var marked = false
    -      try {
    -        // returnValues - Whether to return the values put
    -        // blockStore - The type of storage to put these values into
    -        val (returnValues, blockStore: BlockStore) = {
    -          if (putLevel.useMemory) {
    -            // Put it in memory first, even if it also has useDisk set to true;
    -            // We will drop it to disk later if the memory store can't hold it.
    -            (true, memoryStore)
    -          } else if (putLevel.useOffHeap) {
    -            // Use tachyon for off-heap storage
    -            (false, tachyonStore)
    -          } else if (putLevel.useDisk) {
    -            // Don't get back the bytes from put unless we replicate them
    -            (putLevel.replication > 1, diskStore)
    -          } else {
    -            assert(putLevel == StorageLevel.NONE)
    -            throw new BlockException(
    -              blockId, s"Attempted to put block $blockId without specifying storage level!")
    +    try {
    +      putBlockInfo.synchronized {
    +        logTrace("Put for block %s took %s to get into synchronized block"
    +          .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +
    +        var marked = false
    +        try {
    +          // returnValues - Whether to return the values put
    +          // blockStore - The type of storage to put these values into
    +          val (returnValues, blockStore: BlockStore) = {
    +            if (putLevel.useMemory) {
    +              // Put it in memory first, even if it also has useDisk set to true;
    +              // We will drop it to disk later if the memory store can't hold it.
    +              (true, memoryStore)
    +            } else if (putLevel.useOffHeap) {
    +              // Use tachyon for off-heap storage
    +              (false, tachyonStore)
    +            } else if (putLevel.useDisk) {
    +              // Don't get back the bytes from put unless we replicate them
    +              (putLevel.replication > 1, diskStore)
    +            } else {
    +              assert(putLevel == StorageLevel.NONE)
    +              throw new BlockException(
    +                blockId, s"Attempted to put block $blockId without specifying storage level!")
    +            }
               }
    -        }
     
    -        // Actually put the values
    -        val result = data match {
    -          case IteratorValues(iterator) =>
    -            blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    -          case ArrayValues(array) =>
    -            blockStore.putArray(blockId, array, putLevel, returnValues)
    -          case ByteBufferValues(bytes) =>
    -            bytes.rewind()
    -            blockStore.putBytes(blockId, bytes, putLevel)
    -        }
    -        size = result.size
    -        result.data match {
    -          case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    -          case Right (newBytes) => bytesAfterPut = newBytes
    -          case _ =>
    -        }
    +          // Actually put the values
    +          val result = data match {
    +            case IteratorValues(iterator) =>
    +              blockStore.putIterator(blockId, iterator, putLevel, returnValues)
    +            case ArrayValues(array) =>
    +              blockStore.putArray(blockId, array, putLevel, returnValues)
    +            case ByteBufferValues(bytes) =>
    +              bytes.rewind()
    +              blockStore.putBytes(blockId, bytes, putLevel, resourceCleaner)
    +          }
    +          size = result.size
    +          result.data match {
    +            case Left(newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
    +            case Right(newBytes) => bytesAfterPut = newBytes
    +            case _ =>
    +          }
     
    -        // Keep track of which blocks are dropped from memory
    -        if (putLevel.useMemory) {
    -          result.droppedBlocks.foreach { updatedBlocks += _ }
    -        }
    +          // Keep track of which blocks are dropped from memory
    +          if (putLevel.useMemory) {
    +            result.droppedBlocks.foreach {
    +              updatedBlocks += _
    +            }
    +          }
     
    -        val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    -        if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    -          // Now that the block is in either the memory, tachyon, or disk store,
    -          // let other threads read it, and tell the master about it.
    -          marked = true
    -          putBlockInfo.markReady(size)
    -          if (tellMaster) {
    -            reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +          val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
    +          if (putBlockStatus.storageLevel != StorageLevel.NONE) {
    +            // Now that the block is in either the memory, tachyon, or disk store,
    +            // let other threads read it, and tell the master about it.
    +            marked = true
    +            putBlockInfo.markReady(size)
    +            if (tellMaster) {
    +              reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
    +            }
    +            updatedBlocks += ((blockId, putBlockStatus))
    +          }
    +        } finally {
    +          // If we failed in putting the block to memory/disk, notify other possible readers
    +          // that it has failed, and then remove it from the block info map.
    +          if (!marked) {
    +            // Note that the remove must happen before markFailure otherwise another thread
    +            // could've inserted a new BlockInfo before we remove it.
    +            blockInfo.remove(blockId)
    +            putBlockInfo.markFailure()
    +            logWarning(s"Putting block $blockId failed")
               }
    -          updatedBlocks += ((blockId, putBlockStatus))
    -        }
    -      } finally {
    -        // If we failed in putting the block to memory/disk, notify other possible readers
    -        // that it has failed, and then remove it from the block info map.
    -        if (!marked) {
    -          // Note that the remove must happen before markFailure otherwise another thread
    -          // could've inserted a new BlockInfo before we remove it.
    -          blockInfo.remove(blockId)
    -          putBlockInfo.markFailure()
    -          logWarning(s"Putting block $blockId failed")
             }
           }
    -    }
    -    logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
    +      logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
     
    -    // Either we're storing bytes and we asynchronously started replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    -      data match {
    -        case ByteBufferValues(bytes) =>
    -          if (replicationFuture != null) {
    -            Await.ready(replicationFuture, Duration.Inf)
    -          }
    -        case _ =>
    -          val remoteStartTime = System.currentTimeMillis
    -          // Serialize the block if not already done
    -          if (bytesAfterPut == null) {
    -            if (valuesAfterPut == null) {
    -              throw new SparkException(
    -                "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +      // Either we're storing bytes and we asynchronously started replication, or we're storing
    +      // values and need to serialize and replicate them now:
    +      if (putLevel.replication > 1) {
    +        data match {
    +          case ByteBufferValues(bytes) =>
    +            if (replicationFuture != null) {
    +              Await.ready(replicationFuture, Duration.Inf)
                 }
    -            bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    -          }
    -          replicate(blockId, bytesAfterPut, putLevel)
    -          logDebug("Put block %s remotely took %s"
    -            .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +          case _ =>
    +            val remoteStartTime = System.currentTimeMillis
    +            // Serialize the block if not already done
    +            if (bytesAfterPut == null) {
    +              if (valuesAfterPut == null) {
    +                throw new SparkException(
    +                  "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
    +              }
    +              bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
    +            }
    +            replicate(blockId, bytesAfterPut, putLevel)
    +            logDebug("Put block %s remotely took %s"
    +              .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
    +        }
           }
    -    }
     
    -    BlockManager.dispose(bytesAfterPut)
    +      BlockManager.dispose(bytesAfterPut)
    +    } finally {
    +      // this is to clean up the byte buffer from the *input* (as opposed to the line above, which
    +      // disposes the byte buffer that is the *result* of the put).  We might have turned that byte
    +      // buffer into an iterator of values, in which case the input ByteBuffer should be disposed.
    +      // It will automatically get disposed when we get to the end of the iterator, but in case
    +      // there is some exception before we do, this will take care of it
    +      resourceCleaner.doCleanup()
    --- End diff --
    
    Cleaning up in a `finally` block should always be the default, unless there's a good explanation for not doing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92408211
  
      [Test build #30169 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30169/consoleFull) for   PR 5463 at commit [`7797c69`](https://github.com/apache/spark/commit/7797c69352d89422037cb96dafc3e8f88ad44699).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93465485
  
      [Test build #30351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30351/consoleFull) for   PR 5463 at commit [`3a85734`](https://github.com/apache/spark/commit/3a857345431fe7140057d621431096b7be493391).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-91736017
  
      [Test build #30063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30063/consoleFull) for   PR 5463 at commit [`76bf6f2`](https://github.com/apache/spark/commit/76bf6f2cbd253b79a285c62c488abfa7fed43a09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92205153
  
      [Test build #30136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30136/consoleFull) for   PR 5463 at commit [`e632315`](https://github.com/apache/spark/commit/e6323150374bca412a2b32a46d9443eaed42c606).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
      * `trait ResourceCleaner `
      * `class SimpleResourceCleaner extends ResourceCleaner `
    
     * This patch **removes the following dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.1.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93573347
  
      [Test build #30368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30368/consoleFull) for   PR 5463 at commit [`3a85734`](https://github.com/apache/spark/commit/3a857345431fe7140057d621431096b7be493391).
     * This patch **passes all tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class TaskContext extends Serializable with ResourceCleaner `
    
     * This patch **adds the following new dependencies:**
       * `commons-math3-3.1.1.jar`
       * `snappy-java-1.1.1.6.jar`
    
     * This patch **removes the following dependencies:**
       * `commons-math3-3.4.1.jar`
       * `snappy-java-1.1.1.7.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92205170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30136/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92186454
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30134/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93827549
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30430/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93598539
  
    **[Test build #30377 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30377/consoleFull)**     for PR 5463 at commit [`8961f66`](https://github.com/apache/spark/commit/8961f66f5240d4ec49672ec4de4c434d40d1e1d7)     after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-93624304
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30388/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-6839] BlockManger.dataDeserialize must ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5463#issuecomment-92457831
  
      [Test build #30180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30180/consoleFull) for   PR 5463 at commit [`adae1e3`](https://github.com/apache/spark/commit/adae1e3d08540fbb31dcd9f883914b011c4191a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org