You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2018/09/06 18:20:45 UTC
[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/22356
[SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles ignore minPartitions parameter
## What changes were proposed in this pull request?
This adds a test following https://github.com/apache/spark/pull/21638
## How was this patch tested?
Existing tests and new test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srowen/spark SPARK-22357.2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22356.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22356
----
commit 84dd4a7bc0eba8b04bb5cf53d73042ac5078d611
Author: Sean Owen <se...@...>
Date: 2018-09-06T18:19:02Z
Add test for binaryFiles minPartitions
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2912/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2914/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/22356
Merged to master/2.4
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22356
**[Test build #95771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95771/testReport)** for PR 22356 at commit [`6e1d8fd`](https://github.com/apache/spark/commit/6e1d8fd43091d7b8ad83bda18f4b3701a829dc10).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...
Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/22356#discussion_r215730938
--- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
@@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
}
}
+ test("SPARK-22357 test binaryFiles minPartitions") {
+ sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
+ .set("spark.files.openCostInBytes", "0")
--- End diff --
why is this setting needed: spark.files.openCostInBytes
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22356
**[Test build #95769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95769/testReport)** for PR 22356 at commit [`84dd4a7`](https://github.com/apache/spark/commit/84dd4a7bc0eba8b04bb5cf53d73042ac5078d611).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22356#discussion_r215737727
--- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
@@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
}
}
+ test("SPARK-22357 test binaryFiles minPartitions") {
+ sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
+ .set("spark.files.openCostInBytes", "0")
+ .set("spark.default.parallelism", "1"))
+
+ val tempDir = Utils.createTempDir()
+ val tempDirPath = tempDir.getAbsolutePath
+
+ for (i <- 0 until 8) {
+ val tempFile = new File(tempDir, s"part-0000$i")
+ Files.write("someline1 in file1\nsomeline2 in file1\nsomeline3 in file1", tempFile,
+ StandardCharsets.UTF_8)
+ }
+
+ assert(sc.binaryFiles(tempDirPath, minPartitions = 1).getNumPartitions === 1)
--- End diff --
OK, sure
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by bomeng <gi...@git.apache.org>.
Github user bomeng commented on the issue:
https://github.com/apache/spark/pull/22356
Thanks for taking my codes. Looks good.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...
Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/22356#discussion_r215749840
--- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
@@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
}
}
+ test("SPARK-22357 test binaryFiles minPartitions") {
+ sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
+ .set("spark.files.openCostInBytes", "0")
--- End diff --
ah, I see, thanks for pointing that out!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22356
**[Test build #95771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95771/testReport)** for PR 22356 at commit [`6e1d8fd`](https://github.com/apache/spark/commit/6e1d8fd43091d7b8ad83bda18f4b3701a829dc10).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22356#discussion_r215737907
--- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
@@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
}
}
+ test("SPARK-22357 test binaryFiles minPartitions") {
+ sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
+ .set("spark.files.openCostInBytes", "0")
--- End diff --
This removes its effect in the section of code we're really trying to test:
```
def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) {
val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
val files = listStatus(context).asScala
val totalBytes = files.filterNot(_.isDirectory).map(_.getLen + openCostInBytes).sum
val bytesPerCore = totalBytes / defaultParallelism
val maxSplitSize = Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore))
super.setMaxSplitSize(maxSplitSize)
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...
Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/22356#discussion_r215730668
--- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
@@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
}
}
+ test("SPARK-22357 test binaryFiles minPartitions") {
+ sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
+ .set("spark.files.openCostInBytes", "0")
+ .set("spark.default.parallelism", "1"))
+
+ val tempDir = Utils.createTempDir()
+ val tempDirPath = tempDir.getAbsolutePath
+
+ for (i <- 0 until 8) {
+ val tempFile = new File(tempDir, s"part-0000$i")
+ Files.write("someline1 in file1\nsomeline2 in file1\nsomeline3 in file1", tempFile,
+ StandardCharsets.UTF_8)
+ }
+
+ assert(sc.binaryFiles(tempDirPath, minPartitions = 1).getNumPartitions === 1)
--- End diff --
nitpick: maybe put these three asserts in a loop
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/22356
CC @bomeng @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22356
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95769/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22356
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95771/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22356
**[Test build #95769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95769/testReport)** for PR 22356 at commit [`84dd4a7`](https://github.com/apache/spark/commit/84dd4a7bc0eba8b04bb5cf53d73042ac5078d611).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org