You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2014/07/29 23:25:07 UTC

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/1639

    [SPARK-2737] Add retag() method for changing RDDs' ClassTags.

    The Java API's use of fake ClassTags doesn't seem to cause any problems for Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs to Scala code (e.g. in the MLlib Java API wrapper code). If we call collect() on a Scala RDD with an incorrect ClassTag, this causes ClassCastExceptions when we try to allocate an array of the wrong type (for example, see SPARK-2197).
    
    There are a few possible fixes here. An API-breaking fix would be to completely remove the fake ClassTags and require Java API users to pass java.lang.Class instances to all parallelize() calls and add returnClass fields to all Function implementations. This would be extremely verbose.
    
    Instead, this patch adds internal APIs to "repair" a Scala RDD with an incorrect ClassTag by wrapping it and overriding its ClassTag. This should be okay for cases where the Scala code that calls collect() knows what type of array should be allocated, which is the case in the MLlib wrappers.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-2737

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1639
    
----
commit eb1c7feec65a04ffc9d343ea74c19137161ce44f
Author: Josh Rosen <jo...@apache.org>
Date:   2014-07-29T21:16:47Z

    [SPARK-2737] Add retag() method for changing RDDs' ClassTags.
    
    The Java API's use of fake ClassTags doesn't seem to cause any problems for
    Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs to
    Scala code (e.g. in the MLlib Java API wrapper code). If we call collect() on
    a Scala RDD with an incorrect ClassTag, this causes ClassCastExceptions when we
    try to allocate an array of the wrong type (for example, see SPARK-2197).
    
    There are a few possible fixes here. An API-breaking fix would be to completely
    remove the fake ClassTags and require Java API users to pass java.lang.Class
    instances to all parallelize() calls and add returnClass fields to all Function
    implementations. This would be extremely verbose.
    
    Instead, this patch adds internal APIs to "repair" a Scala RDD with an
    incorrect ClassTag by wrapping it and overriding its ClassTag. This should be
    okay for cases where the Scala code that calls collect() knows what type of
    array should be allocated, which is the case in the MLlib wrappers.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50656907
  
    LGTM, feel free to merge it when it passes tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1639


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50714462
  
    QA results for PR 1639:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17550/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50558104
  
    I'm okay with either this or collectSeq actually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50541554
  
    QA tests have started for PR 1639. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17381/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50711637
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1639#discussion_r15561032
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1239,6 +1239,28 @@ abstract class RDD[T: ClassTag](
       /** The [[org.apache.spark.SparkContext]] that this RDD was created on. */
       def context = sc
     
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(cls: Class[T]): RDD[T] = {
    +    val classTag: ClassTag[T] = ClassTag.apply(cls)
    +    this.retag(classTag)
    +  }
    +
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(classTag: ClassTag[T]): RDD[T] = {
    +    val oldRDD = this
    +    new RDD[T](sc, Seq(new OneToOneDependency(this)))(classTag) {
    +      override protected def getPartitions: Array[Partition] = oldRDD.getPartitions
    +      override def compute(split: Partition, context: TaskContext): Iterator[T] =
    +        oldRDD.compute(split, context)
    +    }
    --- End diff --
    
    You also need to preserve the Partitioner and such. It would be better to do this via `this.mapPartitions` with the preservePartitioning option set to true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50548521
  
    Another option would be to add `collectSeq` or something similar that returns a type with reasonable variance semantics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50714857
  
    Alright, I've merged this.  Thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50541467
  
    /cc @mengxr @jkbradley @mateiz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50576414
  
    My last commit made `classTag` implicit in the retag() method, so in many cases the Scala code can be written as `someJavaRDD.rdd.retag.[...].collect()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1639#discussion_r15568992
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1239,6 +1239,28 @@ abstract class RDD[T: ClassTag](
       /** The [[org.apache.spark.SparkContext]] that this RDD was created on. */
       def context = sc
     
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(cls: Class[T]): RDD[T] = {
    +    val classTag: ClassTag[T] = ClassTag.apply(cls)
    +    this.retag(classTag)
    +  }
    +
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(classTag: ClassTag[T]): RDD[T] = {
    +    val oldRDD = this
    +    new RDD[T](sc, Seq(new OneToOneDependency(this)))(classTag) {
    +      override protected def getPartitions: Array[Partition] = oldRDD.getPartitions
    +      override def compute(split: Partition, context: TaskContext): Iterator[T] =
    +        oldRDD.compute(split, context)
    +    }
    --- End diff --
    
    Actually compute just works at the iterator level, so I don't think mapPartitions would hurt. All you do is pass through the parent's iterator. When you call compute() you're already deserializing the RDD, this won't create extra work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50577126
  
    QA results for PR 1639:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17418/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50574640
  
    QA tests have started for PR 1639. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17418/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1639#discussion_r15567618
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1239,6 +1239,28 @@ abstract class RDD[T: ClassTag](
       /** The [[org.apache.spark.SparkContext]] that this RDD was created on. */
       def context = sc
     
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(cls: Class[T]): RDD[T] = {
    +    val classTag: ClassTag[T] = ClassTag.apply(cls)
    +    this.retag(classTag)
    +  }
    +
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(classTag: ClassTag[T]): RDD[T] = {
    +    val oldRDD = this
    +    new RDD[T](sc, Seq(new OneToOneDependency(this)))(classTag) {
    +      override protected def getPartitions: Array[Partition] = oldRDD.getPartitions
    +      override def compute(split: Partition, context: TaskContext): Iterator[T] =
    +        oldRDD.compute(split, context)
    +    }
    --- End diff --
    
    Would there be any performance impact of running `mapPartitions(identity, preservesPartitioning = true)(classTag)`?  If we have an RDD that's persisted in a serialized format, wouldn't this extra map force an unnecessary deserialization?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50576330
  
    This method is intended to be called by Scala classes that implement Java-friendly wrappers for the Spark Scala API.  For instance, MLlib has APIs that accept RDD[LabelledPoint].  Ideally, the Java wrapper code can simply call the underlying Scala methods without having to worry about how they're implemented.  Therefore, I think we should prefer the `retag()`-based approach, since  `collectSeq` would require us to modify the Scala consumer of the RDD.
    
    Since this is a private, internal API, we should be able to revisit this decision if we change our minds later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50677960
  
    QA results for PR 1639:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17474/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50580659
  
    QA results for PR 1639:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17426/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50559860
  
    I'm going to take another pass on this to see if I can implicitly grab the ClassTag from the caller's scope, so hold off on merging this for a bit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50672317
  
    QA tests have started for PR 1639. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17474/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50576705
  
    Sure, sounds good. Did you see my comments on preserving partitions too though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50581349
  
    Basically it's a shorter way of writing what you wrote. Take a look at MapPartitionsRDD.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50547477
  
    QA results for PR 1639:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17383/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50541642
  
    QA results for PR 1639:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17381/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50671915
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50653500
  
    I've updated this to use mapPartitions().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1639#discussion_r15568957
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1239,6 +1239,28 @@ abstract class RDD[T: ClassTag](
       /** The [[org.apache.spark.SparkContext]] that this RDD was created on. */
       def context = sc
     
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(cls: Class[T]): RDD[T] = {
    +    val classTag: ClassTag[T] = ClassTag.apply(cls)
    +    this.retag(classTag)
    +  }
    +
    +  /**
    +   * Private API for changing an RDD's ClassTag.
    +   * Used for internal Java <-> Scala API compatibility.
    +   */
    +  private[spark] def retag(classTag: ClassTag[T]): RDD[T] = {
    +    val oldRDD = this
    +    new RDD[T](sc, Seq(new OneToOneDependency(this)))(classTag) {
    +      override protected def getPartitions: Array[Partition] = oldRDD.getPartitions
    +      override def compute(split: Partition, context: TaskContext): Iterator[T] =
    +        oldRDD.compute(split, context)
    +    }
    --- End diff --
    
    Sure, the fix with just passing the partitioner also works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50542172
  
    QA tests have started for PR 1639. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17383/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50577408
  
    QA tests have started for PR 1639. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17426/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50581291
  
    In case you don't see the hidden comment above: I don't think mapPartitions would hurt performance here. All you do is pass through the parent's iterator. When you call compute() you're already deserializing the RDD, so this won't create extra work in that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50711863
  
    QA tests have started for PR 1639. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17550/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2737] Add retag() method for changing R...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1639#issuecomment-50653869
  
    QA tests have started for PR 1639. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17458/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---