You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by 10110346 <gi...@git.apache.org> on 2018/02/11 09:21:36 UTC
[GitHub] spark pull request #20576: [CORE]The shuffle dependency specifies aggregatio...
GitHub user 10110346 opened a pull request:
https://github.com/apache/spark/pull/20576
[CORE]The shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`,it seems that serialized sorting can be used
## What changes were proposed in this pull request?
The shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`,it seems that serialized sorting can be used.
## How was this patch tested?
Existing unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/10110346/spark mapsidecombine
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20576.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20576
----
commit ef6d3f1361395905e43b01e1bbf7f0d2f7167cac
Author: liuxian <li...@...>
Date: 2018-02-11T08:44:16Z
fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20576#discussion_r171279296
--- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
@@ -188,9 +188,8 @@ private[spark] object SortShuffleManager extends Logging {
log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
s"${dependency.serializer.getClass.getName}, does not support object relocation")
false
- } else if (dependency.aggregator.isDefined) {
- log.debug(
- s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
+ } else if (dependency.mapSideCombine) {
+ require(dependency.aggregator.isDefined, "Map-side combine without Aggregator specified!")
--- End diff --
can we move this `require` to the constructor of `ShuffleDependency`? It appears many times in the codebase.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20576#discussion_r171437565
--- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
@@ -188,9 +188,8 @@ private[spark] object SortShuffleManager extends Logging {
log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
s"${dependency.serializer.getClass.getName}, does not support object relocation")
false
- } else if (dependency.aggregator.isDefined) {
- log.debug(
- s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
--- End diff --
Thanks, i will update it
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20576
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20576
**[Test build #87308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87308/testReport)** for PR 20576 at commit [`ef6d3f1`](https://github.com/apache/spark/commit/ef6d3f1361395905e43b01e1bbf7f0d2f7167cac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20576
**[Test build #87808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)** for PR 20576 at commit [`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1172/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87808/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20576
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/792/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on the issue:
https://github.com/apache/spark/pull/20576
@cloud-fan @JoshRosen Could you help review it ? thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20576
**[Test build #87308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87308/testReport)** for PR 20576 at commit [`ef6d3f1`](https://github.com/apache/spark/commit/ef6d3f1361395905e43b01e1bbf7f0d2f7167cac).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Posted by 10110346 <gi...@git.apache.org>.
Github user 10110346 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20576#discussion_r171452995
--- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala ---
@@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with Matchers {
mapSideCombine = false
)))
+ // We support serialized shuffle if we do not need to do map-side aggregation
+ assert(canUseSerializedShuffle(shuffleDep(
+ partitioner = new HashPartitioner(2),
+ serializer = kryo,
+ keyOrdering = None,
+ aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])),
+ mapSideCombine = false
--- End diff --
You can see this code: `def groupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])]`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [CORE]The shuffle dependency specifies aggregation ,and ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87308/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20576#discussion_r171278411
--- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
@@ -188,9 +188,8 @@ private[spark] object SortShuffleManager extends Logging {
log.debug(s"Can't use serialized shuffle for shuffle $shufId because the serializer, " +
s"${dependency.serializer.getClass.getName}, does not support object relocation")
false
- } else if (dependency.aggregator.isDefined) {
- log.debug(
- s"Can't use serialized shuffle for shuffle $shufId because an aggregator is defined")
--- End diff --
can we keep the log with a little update?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Posted by Ngone51 <gi...@git.apache.org>.
Github user Ngone51 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20576#discussion_r171451103
--- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala ---
@@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with Matchers {
mapSideCombine = false
)))
+ // We support serialized shuffle if we do not need to do map-side aggregation
+ assert(canUseSerializedShuffle(shuffleDep(
+ partitioner = new HashPartitioner(2),
+ serializer = kryo,
+ keyOrdering = None,
+ aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])),
+ mapSideCombine = false
--- End diff --
Under what scenario will ```mapSideCombine``` be ```false```, but an ```aggregator ``` set ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20576
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20576
**[Test build #87808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)** for PR 20576 at commit [`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org