You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by maji2014 <gi...@git.apache.org> on 2014/12/02 10:59:29 UTC

[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

GitHub user maji2014 opened a pull request:

    https://github.com/apache/spark/pull/3553

    [spark-4691][shuffle]code optimization for judgement

    In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined"
    
    In SortShuffleWriter.scala, "dep.aggregator.isEmpty"  is better than "!dep.aggregator.isDefined" ?
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maji2014/spark spark-4691

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3553
    
----
commit d8f52dc7dc34bc6c1c368d790b7bdfe30c4eb529
Author: maji2014 <ma...@asiainfo.com>
Date:   2014-12-02T09:54:33Z

    code optimization for judgement

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4691][shuffle] Restructure a few lines ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3553


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4691][shuffle] Restructure a few lines ...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3553#issuecomment-67385829
  
    I've merged this into `branch-1.2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3553#issuecomment-65207158
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3553#discussion_r21205859
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
    @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
           } else {
             new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
           }
    -    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
    +    } else if (dep.mapSideCombine) {
    --- End diff --
    
    I think the previous way is much more clear and obvious from my understanding :-).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3553#discussion_r21213023
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
    @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
           } else {
             new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
           }
    -    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
    +    } else if (dep.mapSideCombine) {
    --- End diff --
    
    Could also write this as
    
    ```scala
    if (dep.aggregator.isDefined) {
      ...
    } else {
      require(!dep.mapSideCombine, "Map-side combine requested without Aggregator specified!")
    
      // Convert the Product2s to pairs since this is what downstream RDDs currently expect
      iter.asInstanceOf[Iterator[Product2[K, C]]].map(pair => (pair._1, pair._2))
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]Code improvement for aggr...

Posted by maji2014 <gi...@git.apache.org>.

Github user maji2014 commented on the pull request:

    https://github.com/apache/spark/pull/3553#issuecomment-65761712
  
    @pwendell any idea about this title?/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]Code improvement for aggr...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/3553#issuecomment-65862751
  
    Should probably just merge it into master, and thus it's independent of the 1.2 release, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4691][Minor] Rewrite a few lines in shu...

Posted by maji2014 <gi...@git.apache.org>.

Github user maji2014 commented on the pull request:

    https://github.com/apache/spark/pull/3553#issuecomment-65904987
  
    NP, done for title change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

Posted by maji2014 <gi...@git.apache.org>.

Github user maji2014 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3553#discussion_r21213167
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
    @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
           } else {
             new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
           }
    -    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
    +    } else if (dep.mapSideCombine) {
    --- End diff --
    
    Yes, this seems simple and elegant


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

Posted by maji2014 <gi...@git.apache.org>.

Github user maji2014 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3553#discussion_r21213166
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
    @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
           } else {
             new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
           }
    -    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
    +    } else if (dep.mapSideCombine) {
    --- End diff --
    
    Yes, this seems simple and elegant


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4691][shuffle] Restructure a few lines ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3553#issuecomment-66358597
  
    @aarondav I wanted to back port this into branch-1.2 as well. It would be good to minimize the divergence between master and 1.2 if possible.
    
    I'm merging this into master now and I'll mark it `backport-needed` on the JIRA. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]Code improvement for aggr...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3553#issuecomment-65862084
  
    I would use `[SPARK-4691][Minor] Rewrite a few lines in shuffle code`. Not a big deal if you don't change this. I'll merge this once we cut the new 1.2 release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [spark-4691][shuffle]code optimization for jud...

Posted by maji2014 <gi...@git.apache.org>.

Github user maji2014 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3553#discussion_r21207386
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala ---
    @@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
           } else {
             new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
           }
    -    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
    +    } else if (dep.mapSideCombine) {
    --- End diff --
    
    "if(dep.aggregator.isDefined) else if (dep.aggregator.isEmpty)"  seems duplicate. 
    isEmpty == !isDefined
    We need to do another one more judgement for "dep.aggregator.isEmpty".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org