You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sryza <gi...@git.apache.org> on 2014/07/17 10:27:26 UTC

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

GitHub user sryza opened a pull request:

    https://github.com/apache/spark/pull/1461

    SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency...

    ... per key
    
    My humble opinion is that avoiding allocations in this performance-critical section is worth the extra code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sryza/spark sandy-spark-2553

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1461
    
----
commit 7eaf7f2a18ddea7ee47aacb5c5559c278a924899
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-07-17T08:19:48Z

    SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency per key

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49386013
  
    QA tests have started for PR 1461. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16800/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49402486
  
    Fix: https://github.com/apache/spark/pull/1479


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49401094
  
    Merged this, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49385926
  
    Jenkins, add to whitelist and test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49386170
  
    QA results for PR 1461:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16800/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49386002
  
    Looks good to me, though this isn't avoiding *that* much since you are appending to an ArrayBuffer anyway. But no reason not to do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49406581
  
    I actually did consider this, I should have made a note in the PR.  Agreed that we should be careful with the implications of these small changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49405145
  
    This PR also changed the behavior with respect to mutating CoGroups -- now we mutate combiner1 in place rather than returning a new ArrayBuffer. This is _probably_ not an issue given our current usage of these combiners, but we should be careful when making such seemingly innocuous changes, as they are certainly not worth introducing a bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49496955
  
    Oops, sorry about that, the Jenkins messages look too similar now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1461


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1461#issuecomment-49401593
  
    It looks like there was actually a compile error here that I missed last night.  Uploading the fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---