You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by 10110346 <gi...@git.apache.org> on 2018/02/11 09:21:36 UTC

[GitHub] spark pull request #20576: [CORE]The shuffle dependency specifies aggregatio...

GitHub user 10110346 opened a pull request:

    https://github.com/apache/spark/pull/20576

    [CORE]The shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`,it seems that serialized sorting can be used

    ## What changes were proposed in this pull request?
    The shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`,it seems that serialized sorting can be used.
    
    ## How was this patch tested?
    Existing unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/10110346/spark mapsidecombine

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20576
    
----
commit ef6d3f1361395905e43b01e1bbf7f0d2f7167cac
Author: liuxian <li...@...>
Date:   2018-02-11T08:44:16Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20576#discussion_r171279296
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
    @@ -188,9 +188,8 @@ private[spark] object SortShuffleManager extends Logging {
           log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
             s"${dependency.serializer.getClass.getName}, does not support object relocation")
           false
    -    } else if (dependency.aggregator.isDefined) {
    -      log.debug(
    -        s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
    +    } else if (dependency.mapSideCombine) {
    +      require(dependency.aggregator.isDefined, "Map-side combine without Aggregator specified!")
    --- End diff --
    
    can we move this `require` to the constructor of `ShuffleDependency`? It appears many times in the codebase.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20576#discussion_r171437565
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
    @@ -188,9 +188,8 @@ private[spark] object SortShuffleManager extends Logging {
           log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
             s"${dependency.serializer.getClass.getName}, does not support object relocation")
           false
    -    } else if (dependency.aggregator.isDefined) {
    -      log.debug(
    -        s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
    --- End diff --
    
    Thanks, i will update it


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    **[Test build #87308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87308/testReport)** for PR 20576 at commit [`ef6d3f1`](https://github.com/apache/spark/commit/ef6d3f1361395905e43b01e1bbf7f0d2f7167cac).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    **[Test build #87808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)** for PR 20576 at commit [`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1172/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87808/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20576


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/792/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on the issue:

    https://github.com/apache/spark/pull/20576
  
      @cloud-fan  @JoshRosen Could you help review it ? thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    **[Test build #87308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87308/testReport)** for PR 20576 at commit [`ef6d3f1`](https://github.com/apache/spark/commit/ef6d3f1361395905e43b01e1bbf7f0d2f7167cac).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20576#discussion_r171452995
  
    --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala ---
    @@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with Matchers {
           mapSideCombine = false
         )))
     
    +    // We support serialized shuffle if we do not need to do map-side aggregation
    +    assert(canUseSerializedShuffle(shuffleDep(
    +      partitioner = new HashPartitioner(2),
    +      serializer = kryo,
    +      keyOrdering = None,
    +      aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])),
    +      mapSideCombine = false
    --- End diff --
    
    You can see this code: `def groupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])]`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87308/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20576#discussion_r171278411
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
    @@ -188,9 +188,8 @@ private[spark] object SortShuffleManager extends Logging {
           log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
             s"${dependency.serializer.getClass.getName}, does not support object relocation")
           false
    -    } else if (dependency.aggregator.isDefined) {
    -      log.debug(
    -        s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
    --- End diff --
    
    can we keep the log with a little update?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

Posted by Ngone51 <gi...@git.apache.org>.
Github user Ngone51 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20576#discussion_r171451103
  
    --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala ---
    @@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with Matchers {
           mapSideCombine = false
         )))
     
    +    // We support serialized shuffle if we do not need to do map-side aggregation
    +    assert(canUseSerializedShuffle(shuffleDep(
    +      partitioner = new HashPartitioner(2),
    +      serializer = kryo,
    +      keyOrdering = None,
    +      aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])),
    +      mapSideCombine = false
    --- End diff --
    
    Under what scenario will ```mapSideCombine``` be ```false```, but an ```aggregator ```  set ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20576
  
    **[Test build #87808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)** for PR 20576 at commit [`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org