You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/05 03:00:15 UTC

[GitHub] [spark] zhengruifeng opened a new pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

zhengruifeng opened a new pull request #31480:
URL: https://github.com/apache/spark/pull/31480


   ### What changes were proposed in this pull request?
   avoid unnecessary shuffle if possible
   
   
   ### Why are the changes needed?
   avoid unnecessary shuffle.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   added testsuites and existing testsuites
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-797459535


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136000/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801160689


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-782557549


   **[Test build #135291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135291/testReport)** for PR 31480 at commit [`8a8aadd`](https://github.com/apache/spark/commit/8a8aadd0816a77af40feb51264bfaf643458e072).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-782576168






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774842883


   **[Test build #134998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134998/testReport)** for PR 31480 at commit [`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r594379874



##########
File path: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
##########
@@ -672,6 +672,22 @@ private[spark] class ExternalSorter[K, V, C](
     partitionedIterator.flatMap(pair => pair._2)
   }
 
+  /**
+   * Return an iterator over all the data written to this object, aggregated by our aggregator.
+   * On task completion (success, failure, or cancellation), it updates related task metrics,
+   * and releases resources by calling `stop()`.
+   */
+  def completionIterator: Iterator[Product2[K, C]] = {
+    context.taskMetrics().incMemoryBytesSpilled(memoryBytesSpilled)
+    context.taskMetrics().incDiskBytesSpilled(diskBytesSpilled)
+    context.taskMetrics().incPeakExecutionMemory(peakMemoryUsedBytes)

Review comment:
       I think metrics updates implicitly require records inserting, otherwise, it's meaningless to update. So, shall we do `insertAll` here too? e.g.,
   
   ```scala
   def completionIterator(records: Iterator[Product2[K, V]]): Iterator[Product2[K, C]] = {
     insertAll(records)
     context.taskMetrics().incMemoryBytesSpilled(memoryBytesSpilled)
     ...
   }
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789659087


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-803229759


   Thanks, merged to Master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773787412


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39485/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789603343


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40284/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773752136


   this previous pr https://github.com/apache/spark/pull/29185 is stale, so I open this one.
   
   
   testCode:
   ```
   import org.apache.spark.HashPartitioner
   
   val data = sc.parallelize(Seq((0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)), 2)
   val partitioner = new HashPartitioner(2)
   val agg = data.reduceByKey(partitioner, _ + _)
   agg.persist()
   agg.count
   val sorted = agg.repartitionAndSortWithinPartitions(partitioner)
   sorted.count
   ```
   
   master:
   ![repart-master-2021-02-05-10-58-46](https://user-images.githubusercontent.com/7322292/106984081-dff98780-67a1-11eb-801e-1c2a205b5bd0.png)
   
   this pr:
   ![repart-pr-2021-02-05-10-57-00](https://user-images.githubusercontent.com/7322292/106984096-e5ef6880-67a1-11eb-8ef6-1a0dea8bd0b1.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r579492908



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)
+        context.taskMetrics.incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        // Use completion callback to stop sorter if task was finished/cancelled.
+        context.addTaskCompletionListener[Unit](_ => sorter.stop)
+        val outputIter = new InterruptibleIterator(context,
+          sorter.iterator.asInstanceOf[Iterator[(K, V)]])
+        CompletionIterator[(K, V), Iterator[(K, V)]](outputIter, sorter.stop)
+      }, preservesPartitioning = true)
+    } else {

Review comment:
       I probably mentioned this in the earlier version of this PR as well.
   Can we unify the code in this if block with what is in BlockStoreShuffleReader as well ?
   We are duplicating this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800040296


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136099/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-781891645


   cc @tgravescs @mridulm 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799155672


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136049/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-781891397


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r579585816



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)
+        context.taskMetrics.incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        // Use completion callback to stop sorter if task was finished/cancelled.
+        context.addTaskCompletionListener[Unit](_ => sorter.stop)
+        val outputIter = new InterruptibleIterator(context,
+          sorter.iterator.asInstanceOf[Iterator[(K, V)]])
+        CompletionIterator[(K, V), Iterator[(K, V)]](outputIter, sorter.stop)
+      }, preservesPartitioning = true)
+    } else {

Review comment:
       I misunderstood your previous comments. I am going to unify this in some way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799083659


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40632/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r571781996



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)

Review comment:
       These two may not accurate since the `sorter.iterator`  could also spill during traverse?
   
   Although I also see other places use it in the same way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800729669


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136133/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789620065


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40284/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789623041


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40284/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802542418


   This GA failure seems to be what is fixed at c5cadfe.
   If you rebase to `master`, the failure will go away.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-788856282






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802583314


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40821/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799155672


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136049/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773813064






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773813064






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774585683


   **[Test build #134969 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134969/testReport)** for PR 31480 at commit [`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-782574829


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39870/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801561161


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136184/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773777161


   **[Test build #134903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134903/testReport)** for PR 31480 at commit [`50e8d73`](https://github.com/apache/spark/commit/50e8d7315eca5c6aef4dbbd976d56fb2fb65adb2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789659087


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135702/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774598904






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801560509


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40766/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799185646


   Looks good to me.
   +CC @Ngone51, @HyukjinKwon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-788827744


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773827836


   **[Test build #134903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134903/testReport)** for PR 31480 at commit [`50e8d73`](https://github.com/apache/spark/commit/50e8d7315eca5c6aef4dbbd976d56fb2fb65adb2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-782563371


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39870/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801546172


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-797441233


   **[Test build #136000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136000/testReport)** for PR 31480 at commit [`a3b9063`](https://github.com/apache/spark/commit/a3b90638c15239766c2876f5f999f970914470b6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774835191


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-788856282






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800726771


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40715/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800729622






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774899148






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-803709957


   Thank you so much, guys!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773797320


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39485/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789575543


   **[Test build #135702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135702/testReport)** for PR 31480 at commit [`b751280`](https://github.com/apache/spark/commit/b751280ec466cd665e88ed8e36df968b2afb82af).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-782557549


   **[Test build #135291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135291/testReport)** for PR 31480 at commit [`8a8aadd`](https://github.com/apache/spark/commit/8a8aadd0816a77af40feb51264bfaf643458e072).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r571781670



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,21 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        val outputIter = new InterruptibleIterator(context,
+          sorter.iterator.asInstanceOf[Iterator[(K, V)]])
+        CompletionIterator[(K, V), Iterator[(K, V)]](outputIter, sorter.stop)

Review comment:
       Seems like a double-check and no side-effect. So should be fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802139040


   @Ngone51 The failures seem unrelated, and jenkins infra related .. right ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-782575180


   **[Test build #135291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135291/testReport)** for PR 31480 at commit [`8a8aadd`](https://github.com/apache/spark/commit/8a8aadd0816a77af40feb51264bfaf643458e072).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800729622


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40715/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774855371


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39581/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799037184


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136046/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-797369529


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40584/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800031445


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136098/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-797369529


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40584/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-797361967


   **[Test build #136000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136000/testReport)** for PR 31480 at commit [`a3b9063`](https://github.com/apache/spark/commit/a3b90638c15239766c2876f5f999f970914470b6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800045184


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40681/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800712077


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-782576167






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802583314


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40821/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774590110


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39552/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773752136


   this previous pr https://github.com/apache/spark/pull/29185 is stale, so I open this one.
   
   
   testCode:
   ```
   import org.apache.spark.HashPartitioner
   
   val data = sc.parallelize(Seq((0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)), 2)
   val partitioner = new HashPartitioner(2)
   val agg = data.reduceByKey(partitioner, _ + _)
   agg.persist()
   agg.count
   val sorted = agg.repartitionAndSortWithinPartitions(partitioner)
   sorted.count
   ```
   
   master:
   ![repart-master-2021-02-05-10-58-46](https://user-images.githubusercontent.com/7322292/106984081-dff98780-67a1-11eb-801e-1c2a205b5bd0.png)
   
   this pr:
   ![repart-pr-2021-02-05-10-57-00](https://user-images.githubusercontent.com/7322292/106984096-e5ef6880-67a1-11eb-8ef6-1a0dea8bd0b1.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773777161


   **[Test build #134903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134903/testReport)** for PR 31480 at commit [`50e8d73`](https://github.com/apache/spark/commit/50e8d7315eca5c6aef4dbbd976d56fb2fb65adb2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800031445


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136098/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802894757






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773777161






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800068829


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40681/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773777161


   **[Test build #134903 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134903/testReport)** for PR 31480 at commit [`50e8d73`](https://github.com/apache/spark/commit/50e8d7315eca5c6aef4dbbd976d56fb2fb65adb2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800040296


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136099/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799152489


   **[Test build #136049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136049/testReport)** for PR 31480 at commit [`51fac28`](https://github.com/apache/spark/commit/51fac28e18da2dac4f3bd0ef99fc54ce0c49f928).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-797459535


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136000/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774595114


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39552/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774893887


   **[Test build #134998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134998/testReport)** for PR 31480 at commit [`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r571082348



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,21 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        val outputIter = new InterruptibleIterator(context,
+          sorter.iterator.asInstanceOf[Iterator[(K, V)]])
+        CompletionIterator[(K, V), Iterator[(K, V)]](outputIter, sorter.stop)

Review comment:
       I think it's better to add `sorter.stop` to task completion listener. So we can always release resources even if task fails.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799065102


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799037184






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801561161


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136184/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800729611


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40715/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801207494


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136170/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789657878


   **[Test build #135702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135702/testReport)** for PR 31480 at commit [`b751280`](https://github.com/apache/spark/commit/b751280ec466cd665e88ed8e36df968b2afb82af).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801207494


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136170/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799920824


   Kubernetes integration test unable to build dist.
   
   exiting with code: 141
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40670/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774842883


   **[Test build #134998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134998/testReport)** for PR 31480 at commit [`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r594382447



##########
File path: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
##########
@@ -672,6 +672,22 @@ private[spark] class ExternalSorter[K, V, C](
     partitionedIterator.flatMap(pair => pair._2)
   }
 
+  /**
+   * Return an iterator over all the data written to this object, aggregated by our aggregator.
+   * On task completion (success, failure, or cancellation), it updates related task metrics,
+   * and releases resources by calling `stop()`.
+   */
+  def completionIterator: Iterator[Product2[K, C]] = {
+    context.taskMetrics().incMemoryBytesSpilled(memoryBytesSpilled)
+    context.taskMetrics().incDiskBytesSpilled(diskBytesSpilled)
+    context.taskMetrics().incPeakExecutionMemory(peakMemoryUsedBytes)
+    // Use completion callback to stop sorter if task was finished/cancelled.
+    context.addTaskCompletionListener[Unit](_ => {
+      stop()
+    })

Review comment:
       nit:
   ```suggestion
       context.addTaskCompletionListener[Unit](_ => stop())
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773813064


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39485/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789575543


   **[Test build #135702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135702/testReport)** for PR 31480 at commit [`b751280`](https://github.com/apache/spark/commit/b751280ec466cd665e88ed8e36df968b2afb82af).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773839230


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134903/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774899148






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801203783


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40751/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802584105


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136239/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800068829


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40681/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 closed pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 closed pull request #31480:
URL: https://github.com/apache/spark/pull/31480


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r578958152



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)

Review comment:
       I review the related codes and it seems that `sorter.iterator` may spill during traverse:
   `isShuffleSort = false` in `def iterator` makes internals iterator `destructiveIterator` a `SpillableIterator`.
   
   
   
   We can add the update of `taskMetrics` to a task completion listener if necessary, but maybe in a new ticket.
   
   As to this PR, I perfer to keep the line with existing impl.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774598903






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r594903011



##########
File path: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
##########
@@ -672,6 +672,22 @@ private[spark] class ExternalSorter[K, V, C](
     partitionedIterator.flatMap(pair => pair._2)
   }
 
+  /**
+   * Return an iterator over all the data written to this object, aggregated by our aggregator.
+   * On task completion (success, failure, or cancellation), it updates related task metrics,
+   * and releases resources by calling `stop()`.
+   */
+  def completionIterator: Iterator[Product2[K, C]] = {
+    context.taskMetrics().incMemoryBytesSpilled(memoryBytesSpilled)
+    context.taskMetrics().incDiskBytesSpilled(diskBytesSpilled)
+    context.taskMetrics().incPeakExecutionMemory(peakMemoryUsedBytes)

Review comment:
       Ah, I just saw you already renamed the method name. Should be good enough. You can ignore mine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799083659


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40632/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801203783


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40751/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r571518815



##########
File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
##########
@@ -860,6 +860,32 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext with Eventually {
     assert(partitions(1) === Seq((1, 3), (3, 8), (3, 8)))
   }
 
+  test("repartitionAndSortWithinPartitions without shuffle") {

Review comment:
       I would add a JIRA prefix here in the test title.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r594823632



##########
File path: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
##########
@@ -672,6 +672,22 @@ private[spark] class ExternalSorter[K, V, C](
     partitionedIterator.flatMap(pair => pair._2)
   }
 
+  /**
+   * Return an iterator over all the data written to this object, aggregated by our aggregator.
+   * On task completion (success, failure, or cancellation), it updates related task metrics,
+   * and releases resources by calling `stop()`.
+   */
+  def completionIterator: Iterator[Product2[K, C]] = {
+    context.taskMetrics().incMemoryBytesSpilled(memoryBytesSpilled)
+    context.taskMetrics().incDiskBytesSpilled(diskBytesSpilled)
+    context.taskMetrics().incPeakExecutionMemory(peakMemoryUsedBytes)

Review comment:
       Yeah, I was thinking about the renaming too. Maybe, `insertAllWithMetricsUpdated`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r594381524



##########
File path: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
##########
@@ -672,6 +672,22 @@ private[spark] class ExternalSorter[K, V, C](
     partitionedIterator.flatMap(pair => pair._2)
   }
 
+  /**
+   * Return an iterator over all the data written to this object, aggregated by our aggregator.
+   * On task completion (success, failure, or cancellation), it updates related task metrics,

Review comment:
       IIUC, we only do `stop()` on task completion. Task metrics updates happen once we call `completionIterator`. Could you reword it more accurately? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802534480


   Yea, I think so. The Jenkins infra is really unhealthy these days.
   
   For the GA failure, the community has just fixed one failure related to Scala 2.13 at: https://github.com/apache/spark/commit/c5cadfefdf9b4c6135355b49366fd9e9d1e3fcd0. But this looks like a different one. cc @sarutak @HyukjinKwon for sure. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-802584105


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136239/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r593416332



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)
+        context.taskMetrics.incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        // Use completion callback to stop sorter if task was finished/cancelled.
+        context.addTaskCompletionListener[Unit](_ => sorter.stop)
+        val outputIter = new InterruptibleIterator(context,
+          sorter.iterator.asInstanceOf[Iterator[(K, V)]])
+        CompletionIterator[(K, V), Iterator[(K, V)]](outputIter, sorter.stop)
+      }, preservesPartitioning = true)
+    } else {

Review comment:
       Nice fix !

##########
File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
##########
@@ -860,6 +860,32 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext with Eventually {
     assert(partitions(1) === Seq((1, 3), (3, 8), (3, 8)))
   }
 
+  test("SPARK-32384: repartitionAndSortWithinPartitions without shuffle") {
+    val data = sc.parallelize(Seq((0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)), 2)
+
+    class ModPartitioner(val numPartitions: Int) extends Partitioner {
+      def getPartition(key: Any): Int = key.asInstanceOf[Int] % numPartitions
+
+      override def equals(other: Any): Boolean = other match {
+        case h: ModPartitioner => h.numPartitions == this.numPartitions
+        case _ => false
+      }
+
+      override def hashCode: Int = numPartitions
+    }

Review comment:
       Remove this and use `HashPartitioner` ?

##########
File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
##########
@@ -860,6 +860,32 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext with Eventually {
     assert(partitions(1) === Seq((1, 3), (3, 8), (3, 8)))
   }
 
+  test("SPARK-32384: repartitionAndSortWithinPartitions without shuffle") {
+    val data = sc.parallelize(Seq((0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)), 2)
+
+    class ModPartitioner(val numPartitions: Int) extends Partitioner {
+      def getPartition(key: Any): Int = key.asInstanceOf[Int] % numPartitions
+
+      override def equals(other: Any): Boolean = other match {
+        case h: ModPartitioner => h.numPartitions == this.numPartitions
+        case _ => false
+      }
+
+      override def hashCode: Int = numPartitions
+    }
+
+    val partitioner = new ModPartitioner(2)
+    val agged = data.reduceByKey(partitioner, _ + _)
+    assert(agged.partitioner == Some(partitioner))
+
+    val sorted = agged.repartitionAndSortWithinPartitions(partitioner)
+    assert(sorted.partitioner == Some(partitioner))
+
+    val partitions = sorted.glom().collect()
+    assert(partitions(0) === Seq((0, 13), (2, 6)))
+    assert(partitions(1) === Seq((1, 3), (3, 16)))

Review comment:
       This test is not testing if a shuffle was avoided - just that `repartitionAndSortWithinPartitions` worked, which is already tested elsewhere.
   You will have to test to see how many stages are executed - to validate `repartitionAndSortWithinPartitions` became a narrow dependency instead of shuffle dependency




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r594806172



##########
File path: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
##########
@@ -672,6 +672,22 @@ private[spark] class ExternalSorter[K, V, C](
     partitionedIterator.flatMap(pair => pair._2)
   }
 
+  /**
+   * Return an iterator over all the data written to this object, aggregated by our aggregator.
+   * On task completion (success, failure, or cancellation), it updates related task metrics,
+   * and releases resources by calling `stop()`.
+   */
+  def completionIterator: Iterator[Product2[K, C]] = {
+    context.taskMetrics().incMemoryBytesSpilled(memoryBytesSpilled)
+    context.taskMetrics().incDiskBytesSpilled(diskBytesSpilled)
+    context.taskMetrics().incPeakExecutionMemory(peakMemoryUsedBytes)

Review comment:
       I am ok to add `records` here, but perfer another method name 'insertAllAndUpdateMetrics'




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773839230


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134903/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799050165


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40629/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800006051


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-803478816


   +1, late LGTM. Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r578979477



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,23 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)

Review comment:
       SGTM




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31480:
URL: https://github.com/apache/spark/pull/31480#discussion_r571780987



##########
File path: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
##########
@@ -73,7 +75,21 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
    * because it can push the sorting down into the shuffle machinery.
    */
   def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
-    new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering)
+    if (self.partitioner == Some(partitioner)) {
+      self.mapPartitions(iter => {
+        val context = TaskContext.get
+        val sorter = new ExternalSorter[K, V, V](context, None, None, Some(ordering))
+        sorter.insertAll(iter)
+        context.taskMetrics.incDiskBytesSpilled(sorter.diskBytesSpilled)
+        context.taskMetrics.incMemoryBytesSpilled(sorter.memoryBytesSpilled)
+        context.taskMetrics.incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+        val outputIter = new InterruptibleIterator(context,
+          sorter.iterator.asInstanceOf[Iterator[(K, V)]])
+        CompletionIterator[(K, V), Iterator[(K, V)]](outputIter, sorter.stop)

Review comment:
       Do we still need this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774887363


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39581/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799074055


   **[Test build #136049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136049/testReport)** for PR 31480 at commit [`51fac28`](https://github.com/apache/spark/commit/51fac28e18da2dac4f3bd0ef99fc54ce0c49f928).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-797361967


   **[Test build #136000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136000/testReport)** for PR 31480 at commit [`a3b9063`](https://github.com/apache/spark/commit/a3b90638c15239766c2876f5f999f970914470b6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-799074055


   **[Test build #136049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136049/testReport)** for PR 31480 at commit [`51fac28`](https://github.com/apache/spark/commit/51fac28e18da2dac4f3bd0ef99fc54ce0c49f928).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-801560509


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40766/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-789623041


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40284/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-800041303


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40681/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-773813064


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39485/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org