You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2018/12/10 14:37:26 UTC
spark git commit: [MINOR][DOC] Update the condition description of
serialized shuffle
Repository: spark
Updated Branches:
refs/heads/master 42e8c381b -> 979492327
[MINOR][DOC] Update the condition description of serialized shuffle
## What changes were proposed in this pull request?
`1. The shuffle dependency specifies no aggregation or output ordering.`
If the shuffle dependency specifies aggregation, but it only aggregates at the reduce-side, serialized shuffle can still be used.
`3. The shuffle produces fewer than 16777216 output partitions.`
If the number of output partitions is 16777216 , we can use serialized shuffle.
We can see this mothod: `canUseSerializedShuffle`
## How was this patch tested?
N/A
Closes #23228 from 10110346/SerializedShuffle_doc.
Authored-by: liuxian <li...@zte.com.cn>
Signed-off-by: Wenchen Fan <we...@databricks.com>
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/97949232
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/97949232
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/97949232
Branch: refs/heads/master
Commit: 9794923272c26ee5ba760a57718a368c33d09f04
Parents: 42e8c38
Author: liuxian <li...@zte.com.cn>
Authored: Mon Dec 10 22:37:17 2018 +0800
Committer: Wenchen Fan <we...@databricks.com>
Committed: Mon Dec 10 22:37:17 2018 +0800
----------------------------------------------------------------------
.../scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/97949232/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala b/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
index b51a843..b59fa8e 100644
--- a/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
+++ b/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
@@ -33,10 +33,10 @@ import org.apache.spark.shuffle._
* Sort-based shuffle has two different write paths for producing its map output files:
*
* - Serialized sorting: used when all three of the following conditions hold:
- * 1. The shuffle dependency specifies no aggregation or output ordering.
+ * 1. The shuffle dependency specifies no map-side combine.
* 2. The shuffle serializer supports relocation of serialized values (this is currently
* supported by KryoSerializer and Spark SQL's custom serializers).
- * 3. The shuffle produces fewer than 16777216 output partitions.
+ * 3. The shuffle produces fewer than or equal to 16777216 output partitions.
* - Deserialized sorting: used to handle all other cases.
*
* -----------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org