You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by punya <gi...@git.apache.org> on 2014/02/14 04:08:30 UTC

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

GitHub user punya opened a pull request:

    https://github.com/apache/incubator-spark/pull/600

    Add subtractByKey to the JavaPairRDD wrapper

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-spark subtractByKey-java

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #600
    
----
commit 5df45fb01bddd784cc28ddc7b23369ea52c9edb7
Author: Punya Biswal <pb...@palantir.com>
Date:   2014-02-14T03:00:38Z

    Add subtractByKey to the JavaPairRDD wrapper

----


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35058750
  
    Jenkins, test this please.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35114288
  
    Merged build started.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35114287
  
     Merged build triggered.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35116858
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12719/


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35114189
  
    Jenkins, this is ok to test.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35058812
  
     Merged build triggered.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by punya <gi...@git.apache.org>.
Github user punya closed the pull request at:

    https://github.com/apache/incubator-spark/pull/600


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35058869
  
    Merged build finished.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by punya <gi...@git.apache.org>.
Github user punya commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35226354
  
    Thanks @aarondav! I wondered about the dishonest `ClassTag`s too but convinced myself that in this case it was totally harmless because `subtractByKey` ignores the second component of the pair anyway.
    
    I'll work on a separate PR to rename the variables `cmw → wTag` across `JavaRDD` and friends.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35208921
  
    To answer my own question, the reason we do this is probably because we have no choice when integrating with Java. Additionally, use of ClassTag as in our API rarely affects correctness, just potentially safety and performance (e.g., creating a primitive-backed map versus an object one).
    
    This unfortunately adds a burden we must keep in mind as we develop the Spark Scala APIs:
    
    - ClassTags should be used sparingly and never relied upon for correctness.
    - The Java API may be unavoidably slower than the Scala API where ClassTags are used for perf.
    
    This also means it is very hard, in general, to verify the correctness of a Java API like the one introduced in this patch, since we're tricking the compiler and introducing unusual runtime behavior.
    
    In this particular case, I can verify that the API will work because subtractByKey never actually uses W's ClassTag (I can even remove it from the signature without issue), but that's not guaranteed to hold in the future.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35207151
  
    CC @pwendell and @JoshRosen 
    
    This looks good to me in that it should work in the style elsewhere in this file, but I have a question for someone more knowledgeable with this area of the code than I. This line of code (used analogously throughout this file):
    
    `implicit val cmw: ClassTag[W] = implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[W]]`
    
    doesn't actually get a ClassTag[W], it just gets ClassTag.AnyRef and dresses it up like one. If any of the Scala APIs really needed W's ClassTag, they would fail. But AnyRef can make a poor approximation of W for creating stuff like Arrays, which can certainly hold W's, but also anything else. Are we just lucky that this works right now, or is there a deeper reason for why this is a good idea?


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35058870
  
    One or more automated tests failed
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12713/


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35226744
  
    No need for your renaming PR @punya -- I've already taken care of it in #604 :) Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#discussion_r9778178
  
    "cm" is back from when these were ClassManifests. Everywhere else it's still called that, though, so maybe this is a minor cleanup for another PR.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by punya <gi...@git.apache.org>.
Github user punya commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35097483
  
    @pwendell, the build hasn't retriggered even though I updated my patch. Could you please ask Jenkins to test this pull request.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35225376
  
    This PR doesn't need to block on this discussion, since it doesn't actually rely on the fake ClassTag, so I've merged it into master.
    
    I've created [JIRA-1096](https://spark-project.atlassian.net/browse/SPARK-1098) to track any further discussion and PR #604 to add documentation for this issue. I will update #604 to include subtractByKey once this gets merged into github.
    
    Thanks, @punya!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35058813
  
    Merged build started.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35116857
  
    Merged build finished.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by punya <gi...@git.apache.org>.
Github user punya commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35113788
  
    Jenkins, test this please.


[GitHub] incubator-spark pull request: Add subtractByKey to the JavaPairRDD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/600#issuecomment-35051869
  
    Can one of the admins verify this patch?