You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by holdenk <gi...@git.apache.org> on 2014/05/20 01:52:30 UTC

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/831

    Spark 1857 improve error message when trying perform a spark operation inside another spark operation

    This is a quick little PR that should improve error message when trying perform a spark operation inside another spark operation. Its implemented by adding a getPartitioner function to check if the partitioner is null and adding the same check to the existing sparkContext function on RDDs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark spark-1857-improve-error-message-when-trying-perform-a-spark-operation-inside-another-spark-operation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/831.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #831
    
----
commit a88ac121deef213c96dbf2f1968dfef328e26e9e
Author: Holden Karau <ho...@pigscanfly.ca>
Date:   2014-05-19T22:44:41Z

    A quick pass at improving our error messages when trying to perform an RDD operation inside of another RDD operation. This refactors the access to the transients sc and partioner so that they can be checked. We don't change partioner itself since it is meant to be overrident.

commit fa0ec8b571b97892a55fd40cdfb2b4199de146a3
Author: Holden Karau <ho...@pigscanfly.ca>
Date:   2014-05-19T23:50:36Z

    Call get partioner in some other cases where we were using partitioner

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43693054
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43692891
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12822662
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    --- End diff --
    
    This is does not seem like a good, intuitive solution to identify whether any RDD operation is being executed inside an executor. What if in future this is not transient? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-50692903
  
    Sure thing - do you mind closing this for now then? We're trying to cut down the number of open PR's that have gone temporarily stale :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-45692281
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-48803493
  
    @pwendell : That sounds like a good plan. I'll give it a shot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12824726
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    +    a clearer error method when attempting to perform operations on an RDD inside of
    +    a parallel operation as the partitioner is marked as transient */
    +  def getPartitioner: Option[Partitioner] = {
    +    partitioner match {
    +      case null => throw new SparkException("Actions on RDDs inside of another RDD operation are " +
    +          "not supported")
    +      case _ => partitioner
    +    }
    +  }
    +
       /** The SparkContext that created this RDD. */
    -  def sparkContext: SparkContext = sc
    +  def sparkContext: SparkContext = {
    --- End diff --
    
    Without the partitioner check, what error does it through? NPE? In that case can this be handled by the lookup() function, rather than introducing a non-intuitive check for partitioner?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12825699
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    +    a clearer error method when attempting to perform operations on an RDD inside of
    +    a parallel operation as the partitioner is marked as transient */
    +  def getPartitioner: Option[Partitioner] = {
    +    partitioner match {
    +      case null => throw new SparkException("Actions on RDDs inside of another RDD operation are " +
    +          "not supported")
    +      case _ => partitioner
    +    }
    +  }
    +
       /** The SparkContext that created this RDD. */
    -  def sparkContext: SparkContext = sc
    +  def sparkContext: SparkContext = {
    --- End diff --
    
    I do get the point. I guess my primary concern in public API change; we really dont want to be changing RDD API change if there are ways around it. Specially adding `getPartitioner` is not the right way, as that gives the users two methods `RDD.partitioner` and `RDD.getPartitioner` to access partitions. Thats confusing. 
    
    The right way should be `RDD.partitioner` be converted from val to def. It will use a internal private field called `partitioner_`  to store the partitioner and check for null. See `RDD.partitions` or `RDD.dependencies` for reference. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43572369
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15089/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12822822
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    +    a clearer error method when attempting to perform operations on an RDD inside of
    +    a parallel operation as the partitioner is marked as transient */
    +  def getPartitioner: Option[Partitioner] = {
    +    partitioner match {
    +      case null => throw new SparkException("Actions on RDDs inside of another RDD operation are " +
    +          "not supported")
    +      case _ => partitioner
    +    }
    +  }
    +
       /** The SparkContext that created this RDD. */
    -  def sparkContext: SparkContext = sc
    +  def sparkContext: SparkContext = {
    --- End diff --
    
    It probably is. I figured catching it earlier would be good, but I can take out the getPartioner changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43575322
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43575326
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15090/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12822684
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    +    a clearer error method when attempting to perform operations on an RDD inside of
    +    a parallel operation as the partitioner is marked as transient */
    +  def getPartitioner: Option[Partitioner] = {
    +    partitioner match {
    +      case null => throw new SparkException("Actions on RDDs inside of another RDD operation are " +
    +          "not supported")
    +      case _ => partitioner
    +    }
    +  }
    +
       /** The SparkContext that created this RDD. */
    -  def sparkContext: SparkContext = sc
    +  def sparkContext: SparkContext = {
    --- End diff --
    
    In fact this is definitely a better way to identify RDD operation inside RDD operations. Why isnt this sufficient (so that you are having to check partitioner as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43694156
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12824928
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    +    a clearer error method when attempting to perform operations on an RDD inside of
    +    a parallel operation as the partitioner is marked as transient */
    +  def getPartitioner: Option[Partitioner] = {
    +    partitioner match {
    +      case null => throw new SparkException("Actions on RDDs inside of another RDD operation are " +
    +          "not supported")
    +      case _ => partitioner
    +    }
    +  }
    +
       /** The SparkContext that created this RDD. */
    -  def sparkContext: SparkContext = sc
    +  def sparkContext: SparkContext = {
    --- End diff --
    
    Its a NPE. Matei suggested replacing it with a less obtuse error message in https://issues.apache.org/jira/browse/SPARK-1857 . I think this is a better place to put the check since:
    1) Other people might try and access the partitioner and then they get sensible error messages
    2) Checking for a NPE inside of the lookup() function isn't any more intuitive
    3) We already need to add a similar check to context/SparkContext here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12822863
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    +    a clearer error method when attempting to perform operations on an RDD inside of
    +    a parallel operation as the partitioner is marked as transient */
    +  def getPartitioner: Option[Partitioner] = {
    +    partitioner match {
    +      case null => throw new SparkException("Actions on RDDs inside of another RDD operation are " +
    +          "not supported")
    +      case _ => partitioner
    +    }
    +  }
    +
       /** The SparkContext that created this RDD. */
    -  def sparkContext: SparkContext = sc
    +  def sparkContext: SparkContext = {
    --- End diff --
    
    Actual wait it isn't. We would still get a null pointer exception in the case of lookup rather than a helpful error message since lookup peeks at the partitioner before doing anything with the sparkcontext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-45692330
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15661/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43692883
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43573230
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43694027
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43575324
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43572296
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43694015
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43694158
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15111/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43572367
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12822810
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    --- End diff --
    
    We would still probably catch the error when it went to use the SparkContext to call runJob, it would be a bit further down the call stack though (although I just traced the code and it looks like we would get a null pointer exception since we have a second accessor method to the internal sparkcontext called context so I wrapped that access method as well).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-50693037
  
    Sure I should probably restart it from scratch anyways.
    
    
    On Wed, Jul 30, 2014 at 4:09 PM, Patrick Wendell <no...@github.com>
    wrote:
    
    > Sure thing - do you mind closing this for now then? We're trying to cut
    > down the number of open PR's that have gone temporarily stale :)
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/831#issuecomment-50692903>.
    >
    
    
    
    -- 
    Cell : 425-233-8271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43572284
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r13413232
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    --- End diff --
    
    It adds to the API of RDD and the previous usage was unsafe so I'd argue this is better, but I can understand the argument so I'll hack it in without this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-45692288
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43572882
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43575327
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15091/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/831#discussion_r12824809
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -118,8 +118,25 @@ abstract class RDD[T: ClassTag](
       // Methods and fields available on all RDDs
       // =======================================================================
     
    +  /** Accessor method which throws a runtime exception if null. This lets us have
    --- End diff --
    
    The reason I am afraid about introducing a getPartitioner() is that it changes the public API of RDD, which does not seem to be worth it to solve this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43572894
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43693056
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15110/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-45804638
  
    @holdenk - I think a better way would be to modify the ClosureCleaner. In that part of the code we already walk through all of the objects referenced inside of the Closure, so I think you could just check if an object in the closure is a sub type of RDD. We could chose to throw an exception here or to just log an error message (I could see some cases where users might have used this and it actually worked, e.g. if they were including a reference to the RDD and just calling `getName` or something on it).
    
    The approach now is a bit round about and it relies on some implementation artifacts to determine when we are in this case. I wonder if we could have a much smaller/nicer fix by just looking directly at the closure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-43573234
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/831#issuecomment-45692329
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Spark 1857 improve error message when trying p...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk closed the pull request at:

    https://github.com/apache/spark/pull/831


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---