You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2015/10/12 04:42:43 UTC

[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/9070

    [SPARK-11060][Streaming] Fix some potential NPE in DStream transformation

    This patch fixes:
    
    1. Guard out against NPEs in `TransformedDStream` when parent DStream returns None instead of empty RDD.
    2. Verify some input streams which will potentially return None.
    3. Add unit test to verify the behavior when input stream returns None.
    
    cc @tdas , please help to review, thanks a lot.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-11060

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9070.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9070
    
----
commit 458c9753bbef94f213c4492f44471ac562c51609
Author: jerryshao <ss...@hortonworks.com>
Date:   2015-10-12T02:23:00Z

    Fix some potential NPE in DStream transformation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9070#discussion_r41731681
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/TransformedDStream.scala ---
    @@ -39,7 +39,10 @@ class TransformedDStream[U: ClassTag] (
       override def slideDuration: Duration = parents.head.slideDuration
     
       override def compute(validTime: Time): Option[RDD[U]] = {
    -    val parentRDDs = parents.map(_.getOrCompute(validTime).orNull).toSeq
    +    val parentRDDs = parents.map { parent => parent.getOrCompute(validTime).getOrElse(
    --- End diff --
    
    You've probably thought through this, so I'm mostly asking: do we want to assume the parent can be computed? in some places `null` or `None` does correctly mean "can't be computed now". That may not be here. I'm making sure we don't 'fix' the wrong part.
    
    PS you can still retain the compact `_.getOrCompute` syntax here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9070#issuecomment-147285741
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9070#discussion_r41731988
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/TransformedDStream.scala ---
    @@ -39,7 +39,10 @@ class TransformedDStream[U: ClassTag] (
       override def slideDuration: Duration = parents.head.slideDuration
     
       override def compute(validTime: Time): Option[RDD[U]] = {
    -    val parentRDDs = parents.map(_.getOrCompute(validTime).orNull).toSeq
    +    val parentRDDs = parents.map { parent => parent.getOrCompute(validTime).getOrElse(
    --- End diff --
    
    If "can't be computed now" and returns None, here `orNull` should not be used, it will get null and pass by user, which will lead to NPE. For example:
    
    ```
    inputStream.transform(_.union(rdd))
    ```
    
    If inputStream returns `None` as what you mean by "can't be computed now", this clause will throw NPE, so we should guard out this situation.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9070#issuecomment-147276849
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/9070#issuecomment-148005769
  
    Ping @tdas for a look just in case but this seems fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9070#issuecomment-147285743
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43554/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9070#issuecomment-147276843
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9070#issuecomment-147277008
  
      [Test build #43554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43554/consoleFull) for   PR 9070 at commit [`458c975`](https://github.com/apache/spark/commit/458c9753bbef94f213c4492f44471ac562c51609).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9070#discussion_r41732138
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/TransformedDStream.scala ---
    @@ -39,7 +39,10 @@ class TransformedDStream[U: ClassTag] (
       override def slideDuration: Duration = parents.head.slideDuration
     
       override def compute(validTime: Time): Option[RDD[U]] = {
    -    val parentRDDs = parents.map(_.getOrCompute(validTime).orNull).toSeq
    +    val parentRDDs = parents.map { parent => parent.getOrCompute(validTime).getOrElse(
    --- End diff --
    
    Also see here in [UnionDStream](https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/UnionDStream.scala#L44), also has such defensive assumption.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9070#issuecomment-147285698
  
      [Test build #43554 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43554/console) for   PR 9070 at commit [`458c975`](https://github.com/apache/spark/commit/458c9753bbef94f213c4492f44471ac562c51609).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9070


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9070#discussion_r41731525
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/ConstantInputDStream.scala ---
    @@ -27,6 +28,9 @@ import scala.reflect.ClassTag
     class ConstantInputDStream[T: ClassTag](ssc_ : StreamingContext, rdd: RDD[T])
       extends InputDStream[T](ssc_) {
     
    +  require(rdd != null,
    --- End diff --
    
    (This could have been resolved with https://github.com/apache/spark/pull/8881 for more consistency rather than a different patch)
    
    Is this inconsistent with SPARK-10772 and the other changes, which cause `SparkException`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11060][Streaming] Fix some potential NP...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9070#discussion_r41733627
  
    --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/TransformedDStream.scala ---
    @@ -39,7 +39,10 @@ class TransformedDStream[U: ClassTag] (
       override def slideDuration: Duration = parents.head.slideDuration
     
       override def compute(validTime: Time): Option[RDD[U]] = {
    -    val parentRDDs = parents.map(_.getOrCompute(validTime).orNull).toSeq
    +    val parentRDDs = parents.map { parent => parent.getOrCompute(validTime).getOrElse(
    --- End diff --
    
    Sounds good, I see why this is consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org