You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/23 09:13:48 UTC

[GitHub] [spark] AngersZhuuuu opened a new pull request #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

AngersZhuuuu opened a new pull request #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988
 
 
   ### What changes were proposed in this pull request?
   
   When last partition's splitFile's split size is larger then  maxSize, this partition will be lost
   
   ### Why are the changes needed?
   Fix bug
   
   
   ### Does this PR introduce any user-facing change?
   NO
   
   ### How was this patch tested?
   NO
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448530
 
 
   Sorry for late visiting here, @AngersZhuuuu .
   BTW, please use PR description instead of comment (https://github.com/apache/spark/pull/27988#issuecomment-602974768).
   - The PR description will be a commit log with your patch.
   - A comment on PR is hard to find later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452309
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25389/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449944
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120689/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607451012
 
 
   Retest this please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607451791
 
 
   **[Test build #120690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120690/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529674
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120690/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#discussion_r402014223
 
 

 ##########
 File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
 ##########
 @@ -1298,18 +1298,18 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria
       val splitSize = fileSplit.getLength
       if (currentSum + splitSize < maxSize) {
         addPartition(partition, splitSize)
-        index += 1
-        if (index == partitions.size) {
-          updateGroups
-        }
       } else {
-        if (currentGroup.partitions.size == 0) {
+        if (currentGroup.partitions.isEmpty) {
           addPartition(partition, splitSize)
-          index += 1
         } else {
-          updateGroups
+          updateGroups()
+          addPartition(partition, splitSize)
         }
       }
+      index += 1
+      if (index == partitions.length) {
+        updateGroups()
+      }
 
 Review comment:
   > BTW, the previous code was added at 2.0.0. Did you hit a test flakiness due to this?
   
   Yeah, these days I am testing a way to  fix spark small out put file problem, and I use this logic, during test, found that this logic is wrong.
   https://github.com/apache/spark/pull/27248

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602475945
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-609792200
 
 
   @dongjoon-hyun  Any more need to add for this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529669
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602974768
 
 
   @dongjoon-hyun 
   ```
   
     // 3. since index = partition.size now,  jump out of the loop , then the last partition is lost since we won't call updatePartition() again.
   while (index < partitions.size) {
        //  we assume that when index = partitions.length -1  and 
         //  splitSize > maxSize
         val partition = partitions(index)
         val fileSplit =
           partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit]
         val splitSize = fileSplit.getLength
         if (currentSum + splitSize < maxSize) {
           addPartition(partition, splitSize)
           index += 1
           if (index == partitions.size) {
             updateGroups
           }
         } else {
          //  2.then  will addPartition() to currentGroup and index updated , index = partition.size now
           if (currentGroup.partitions.size == 0) {
             addPartition(partition, splitSize)
             index += 1
           } else {
           //   1. will call updateGroups() here first, and index won't update
             updateGroups
           }
         }
       }
       groups.toArray
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602476612
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449918
 
 
   **[Test build #120689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120689/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
    * This patch **fails build dependency tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448947
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25388/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448295
 
 
   **[Test build #120689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120689/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448939
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607451791
 
 
   **[Test build #120690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120690/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529669
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452301
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449944
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120689/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452309
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25389/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449933
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448947
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25388/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607582118
 
 
   > Sorry for late visiting here, @AngersZhuuuu .
   > BTW, please use PR description instead of comment ([#27988 (comment)](https://github.com/apache/spark/pull/27988#issuecomment-602974768)).
   > 
   > * The PR description will be a commit log with your patch.
   > * A comment on PR is hard to find later.
   
   Changed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452301
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#discussion_r402014223
 
 

 ##########
 File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
 ##########
 @@ -1298,18 +1298,18 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria
       val splitSize = fileSplit.getLength
       if (currentSum + splitSize < maxSize) {
         addPartition(partition, splitSize)
-        index += 1
-        if (index == partitions.size) {
-          updateGroups
-        }
       } else {
-        if (currentGroup.partitions.size == 0) {
+        if (currentGroup.partitions.isEmpty) {
           addPartition(partition, splitSize)
-          index += 1
         } else {
-          updateGroups
+          updateGroups()
+          addPartition(partition, splitSize)
         }
       }
+      index += 1
+      if (index == partitions.length) {
+        updateGroups()
+      }
 
 Review comment:
   > BTW, the previous code was added at 2.0.0. Did you hit a test flakiness due to this?
   
   Yeah, these days I am testing a way to  fix spark small out put file problem, and I use this logic, during test, found that this logic is wrong.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#discussion_r401861089
 
 

 ##########
 File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
 ##########
 @@ -1298,18 +1298,18 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria
       val splitSize = fileSplit.getLength
       if (currentSum + splitSize < maxSize) {
         addPartition(partition, splitSize)
-        index += 1
-        if (index == partitions.size) {
-          updateGroups
-        }
       } else {
-        if (currentGroup.partitions.size == 0) {
+        if (currentGroup.partitions.isEmpty) {
           addPartition(partition, splitSize)
-          index += 1
         } else {
-          updateGroups
+          updateGroups()
+          addPartition(partition, splitSize)
         }
       }
+      index += 1
+      if (index == partitions.length) {
+        updateGroups()
+      }
 
 Review comment:
   BTW, the previous code was added at 2.0.0. Did you hit a test flakiness due to this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529674
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120690/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602974768
 
 
   @dongjoon-hyun 
   ```
   
     // 3. since index = partition.size now,  jump out of the loop , then the last partition is lost since we won't call updatePartition() again.
   while (index < partitions.size) {
        //  we assume that when index = partitions.length -1  and 
         //  splitSize > maxSize
         val partition = partitions(index)
         val fileSplit =
           partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit]
         val splitSize = fileSplit.getLength
         if (currentSum + splitSize < maxSize) {
           addPartition(partition, splitSize)
           index += 1
           if (index == partitions.size) {
             updateGroups
           }
         } else {
          //  2.then  will addPartition() to currentGroup and index updated , index = partition.size now
           if (currentGroup.partitions.size == 0) {
             addPartition(partition, splitSize)
             index += 1
           } else {
           //   1. will call updateGroups() here first, and index won't update
             updateGroups
           }
         }
       }
       groups.toArray
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602476612
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448939
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448295
 
 
   **[Test build #120689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120689/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607447566
 
 
   ok to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529010
 
 
   **[Test build #120690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120690/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602475945
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449933
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org