You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2018/09/06 18:20:45 UTC

[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/22356

    [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles ignore minPartitions parameter

    ## What changes were proposed in this pull request?
    
    This adds a test following https://github.com/apache/spark/pull/21638
    
    ## How was this patch tested?
    
    Existing tests and new test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-22357.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22356.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22356
    
----
commit 84dd4a7bc0eba8b04bb5cf53d73042ac5078d611
Author: Sean Owen <se...@...>
Date:   2018-09-06T18:19:02Z

    Add test for binaryFiles minPartitions

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2912/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2914/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Merged to master/2.4


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    **[Test build #95771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95771/testReport)** for PR 22356 at commit [`6e1d8fd`](https://github.com/apache/spark/commit/6e1d8fd43091d7b8ad83bda18f4b3701a829dc10).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22356#discussion_r215730938
  
    --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
    @@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
         }
       }
     
    +  test("SPARK-22357 test binaryFiles minPartitions") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
    +      .set("spark.files.openCostInBytes", "0")
    --- End diff --
    
    why is this setting needed: spark.files.openCostInBytes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    **[Test build #95769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95769/testReport)** for PR 22356 at commit [`84dd4a7`](https://github.com/apache/spark/commit/84dd4a7bc0eba8b04bb5cf53d73042ac5078d611).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22356#discussion_r215737727
  
    --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
    @@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
         }
       }
     
    +  test("SPARK-22357 test binaryFiles minPartitions") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
    +      .set("spark.files.openCostInBytes", "0")
    +      .set("spark.default.parallelism", "1"))
    +
    +    val tempDir = Utils.createTempDir()
    +    val tempDirPath = tempDir.getAbsolutePath
    +
    +    for (i <- 0 until 8) {
    +      val tempFile = new File(tempDir, s"part-0000$i")
    +      Files.write("someline1 in file1\nsomeline2 in file1\nsomeline3 in file1", tempFile,
    +        StandardCharsets.UTF_8)
    +    }
    +
    +    assert(sc.binaryFiles(tempDirPath, minPartitions = 1).getNumPartitions === 1)
    --- End diff --
    
    OK, sure


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by bomeng <gi...@git.apache.org>.
Github user bomeng commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Thanks for taking my codes. Looks good. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22356#discussion_r215749840
  
    --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
    @@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
         }
       }
     
    +  test("SPARK-22357 test binaryFiles minPartitions") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
    +      .set("spark.files.openCostInBytes", "0")
    --- End diff --
    
    ah, I see, thanks for pointing that out!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    **[Test build #95771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95771/testReport)** for PR 22356 at commit [`6e1d8fd`](https://github.com/apache/spark/commit/6e1d8fd43091d7b8ad83bda18f4b3701a829dc10).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22356#discussion_r215737907
  
    --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
    @@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
         }
       }
     
    +  test("SPARK-22357 test binaryFiles minPartitions") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
    +      .set("spark.files.openCostInBytes", "0")
    --- End diff --
    
    This removes its effect in the section of code we're really trying to test:
    
    ```
    def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) {
        val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
        val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
        val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
        val files = listStatus(context).asScala
        val totalBytes = files.filterNot(_.isDirectory).map(_.getLen + openCostInBytes).sum
        val bytesPerCore = totalBytes / defaultParallelism
        val maxSplitSize = Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore))
        super.setMaxSplitSize(maxSplitSize)
      }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22356#discussion_r215730668
  
    --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
    @@ -299,6 +301,25 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
         }
       }
     
    +  test("SPARK-22357 test binaryFiles minPartitions") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")
    +      .set("spark.files.openCostInBytes", "0")
    +      .set("spark.default.parallelism", "1"))
    +
    +    val tempDir = Utils.createTempDir()
    +    val tempDirPath = tempDir.getAbsolutePath
    +
    +    for (i <- 0 until 8) {
    +      val tempFile = new File(tempDir, s"part-0000$i")
    +      Files.write("someline1 in file1\nsomeline2 in file1\nsomeline3 in file1", tempFile,
    +        StandardCharsets.UTF_8)
    +    }
    +
    +    assert(sc.binaryFiles(tempDirPath, minPartitions = 1).getNumPartitions === 1)
    --- End diff --
    
    nitpick: maybe put these three asserts in a loop


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    CC @bomeng @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binary...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22356


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95769/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95771/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22356: [SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22356
  
    **[Test build #95769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95769/testReport)** for PR 22356 at commit [`84dd4a7`](https://github.com/apache/spark/commit/84dd4a7bc0eba8b04bb5cf53d73042ac5078d611).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org