You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/23 09:13:48 UTC
[GitHub] [spark] AngersZhuuuu opened a new pull request #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
AngersZhuuuu opened a new pull request #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988
### What changes were proposed in this pull request?
When last partition's splitFile's split size is larger then maxSize, this partition will be lost
### Why are the changes needed?
Fix bug
### Does this PR introduce any user-facing change?
NO
### How was this patch tested?
NO
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448530
Sorry for late visiting here, @AngersZhuuuu .
BTW, please use PR description instead of comment (https://github.com/apache/spark/pull/27988#issuecomment-602974768).
- The PR description will be a commit log with your patch.
- A comment on PR is hard to find later.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452309
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25389/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449944
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120689/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607451012
Retest this please.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607451791
**[Test build #120690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120690/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529674
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120690/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#discussion_r402014223
##########
File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
##########
@@ -1298,18 +1298,18 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria
val splitSize = fileSplit.getLength
if (currentSum + splitSize < maxSize) {
addPartition(partition, splitSize)
- index += 1
- if (index == partitions.size) {
- updateGroups
- }
} else {
- if (currentGroup.partitions.size == 0) {
+ if (currentGroup.partitions.isEmpty) {
addPartition(partition, splitSize)
- index += 1
} else {
- updateGroups
+ updateGroups()
+ addPartition(partition, splitSize)
}
}
+ index += 1
+ if (index == partitions.length) {
+ updateGroups()
+ }
Review comment:
> BTW, the previous code was added at 2.0.0. Did you hit a test flakiness due to this?
Yeah, these days I am testing a way to fix spark small out put file problem, and I use this logic, during test, found that this logic is wrong.
https://github.com/apache/spark/pull/27248
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602475945
Can one of the admins verify this patch?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-609792200
@dongjoon-hyun Any more need to add for this?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529669
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu removed a comment on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602974768
@dongjoon-hyun
```
// 3. since index = partition.size now, jump out of the loop , then the last partition is lost since we won't call updatePartition() again.
while (index < partitions.size) {
// we assume that when index = partitions.length -1 and
// splitSize > maxSize
val partition = partitions(index)
val fileSplit =
partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit]
val splitSize = fileSplit.getLength
if (currentSum + splitSize < maxSize) {
addPartition(partition, splitSize)
index += 1
if (index == partitions.size) {
updateGroups
}
} else {
// 2.then will addPartition() to currentGroup and index updated , index = partition.size now
if (currentGroup.partitions.size == 0) {
addPartition(partition, splitSize)
index += 1
} else {
// 1. will call updateGroups() here first, and index won't update
updateGroups
}
}
}
groups.toArray
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602476612
Can one of the admins verify this patch?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST]
SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449918
**[Test build #120689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120689/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448947
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25388/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST]
SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448295
**[Test build #120689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120689/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448939
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607451791
**[Test build #120690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120690/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529669
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452301
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449944
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120689/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452309
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25389/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449933
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448947
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25388/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607582118
> Sorry for late visiting here, @AngersZhuuuu .
> BTW, please use PR description instead of comment ([#27988 (comment)](https://github.com/apache/spark/pull/27988#issuecomment-602974768)).
>
> * The PR description will be a commit log with your patch.
> * A comment on PR is hard to find later.
Changed
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607452301
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#discussion_r402014223
##########
File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
##########
@@ -1298,18 +1298,18 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria
val splitSize = fileSplit.getLength
if (currentSum + splitSize < maxSize) {
addPartition(partition, splitSize)
- index += 1
- if (index == partitions.size) {
- updateGroups
- }
} else {
- if (currentGroup.partitions.size == 0) {
+ if (currentGroup.partitions.isEmpty) {
addPartition(partition, splitSize)
- index += 1
} else {
- updateGroups
+ updateGroups()
+ addPartition(partition, splitSize)
}
}
+ index += 1
+ if (index == partitions.length) {
+ updateGroups()
+ }
Review comment:
> BTW, the previous code was added at 2.0.0. Did you hit a test flakiness due to this?
Yeah, these days I am testing a way to fix spark small out put file problem, and I use this logic, during test, found that this logic is wrong.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#discussion_r401861089
##########
File path: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
##########
@@ -1298,18 +1298,18 @@ class SizeBasedCoalescer(val maxSize: Int) extends PartitionCoalescer with Seria
val splitSize = fileSplit.getLength
if (currentSum + splitSize < maxSize) {
addPartition(partition, splitSize)
- index += 1
- if (index == partitions.size) {
- updateGroups
- }
} else {
- if (currentGroup.partitions.size == 0) {
+ if (currentGroup.partitions.isEmpty) {
addPartition(partition, splitSize)
- index += 1
} else {
- updateGroups
+ updateGroups()
+ addPartition(partition, splitSize)
}
}
+ index += 1
+ if (index == partitions.length) {
+ updateGroups()
+ }
Review comment:
BTW, the previous code was added at 2.0.0. Did you hit a test flakiness due to this?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529674
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120690/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602974768
@dongjoon-hyun
```
// 3. since index = partition.size now, jump out of the loop , then the last partition is lost since we won't call updatePartition() again.
while (index < partitions.size) {
// we assume that when index = partitions.length -1 and
// splitSize > maxSize
val partition = partitions(index)
val fileSplit =
partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit]
val splitSize = fileSplit.getLength
if (currentSum + splitSize < maxSize) {
addPartition(partition, splitSize)
index += 1
if (index == partitions.size) {
updateGroups
}
} else {
// 2.then will addPartition() to currentGroup and index updated , index = partition.size now
if (currentGroup.partitions.size == 0) {
addPartition(partition, splitSize)
index += 1
} else {
// 1. will call updateGroups() here first, and index won't update
updateGroups
}
}
}
groups.toArray
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602476612
Can one of the admins verify this patch?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448939
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607448295
**[Test build #120689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120689/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607447566
ok to test
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27988:
[SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27988: [SPARK-31226][CORE][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607529010
**[Test build #120690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120690/testReport)** for PR 27988 at commit [`2c4ef14`](https://github.com/apache/spark/commit/2c4ef1431cec3b82da3b40fb7fb4a92f4a10e9d4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-602475945
Can one of the admins verify this patch?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27988:
[SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27988: [SPARK-31226][SQL][TEST] SizeBasedCoalesce logic will lose partition
URL: https://github.com/apache/spark/pull/27988#issuecomment-607449933
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org