You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/10/05 16:00:19 UTC

[GitHub] spark pull request #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsR...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/22643

    [SPARK-25630][TEST] Reduce test time of HadoopFsRelationTest

    ## What changes were proposed in this pull request?
    There was 5 suites extends `HadoopFsRelationTest`,  for testing "orc"/"parquet"/"text"/"json" data sources. 
    This PR refactor the base trait `HadoopFsRelationTest`:
    1. Rename unnecessary loop for setting parquet conf
    2. The test case `SPARK-8406: Avoids name collision while writing files` takes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time.
    
    The total test run time is reduce from 6 minutes to 4.5 minutes.
    
    ## How was this patch tested?
    
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark refactorHadoopFsRelationTest

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22643
    
----
commit 9a74db0195ca775878b5fec65fe38928c09c1792
Author: Gengliang Wang <ge...@...>
Date:   2018-10-05T15:53:33Z

    refactor HadoopFsRelationTest

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsR...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22643#discussion_r223115147
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala ---
    @@ -114,10 +118,21 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with Tes
         new UDT.MyDenseVectorUDT()
       ).filter(supportsDataType)
     
    +  private val parquetDictionaryEncodingEnabledConfs = if (isParquetDataSource) {
    +    // Run with/without Parquet dictionary encoding enabled for Parquet data source.
    +    Seq(true, false)
    +  } else {
    +    Seq(false)
    +  }
    +
       for (dataType <- supportedDataTypes) {
    -    for (parquetDictionaryEncodingEnabled <- Seq(true, false)) {
    -      test(s"test all data types - $dataType with parquet.enable.dictionary = " +
    -        s"$parquetDictionaryEncodingEnabled") {
    +    for (parquetDictionaryEncodingEnabled <- parquetDictionaryEncodingEnabledConfs) {
    +      val extraMessage = if (isParquetDataSource) {
    +        s" with parquet.enable.dictionary = $parquetDictionaryEncodingEnabled"
    +      } else {
    +        ""
    +      }
    +      test(s"test all data types - $dataType$extraMessage") {
    --- End diff --
    
    This PR accidentally seems to disable `parquet.enable.dictionary = true` cases even in `ParquetHadoopFsRelationSuite`. Could you fix that? After fixing that, we need to measure the time redunction again.
    - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97000/consoleFull
    ```scala
    [info] ParquetHadoopFsRelationSuite:
    [info] - test all data types - StringType (830 milliseconds)
    ...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97000/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3747/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    @dongjoon-hyun please take another look, thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    **[Test build #97068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97068/testReport)** for PR 22643 at commit [`59ca9e0`](https://github.com/apache/spark/commit/59ca9e0f2fd6234217f63c25c41a477c4e435b50).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3760/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    **[Test build #97000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97000/testReport)** for PR 22643 at commit [`9a74db0`](https://github.com/apache/spark/commit/9a74db0195ca775878b5fec65fe38928c09c1792).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsR...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22643#discussion_r223155383
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala ---
    @@ -760,23 +775,27 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with Tes
       // requirement.  We probably want to move this test case to spark-integration-tests or spark-perf
       // later.
       test("SPARK-8406: Avoids name collision while writing files") {
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3718/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsR...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22643#discussion_r223146731
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala ---
    @@ -760,23 +775,27 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with Tes
       // requirement.  We probably want to move this test case to spark-integration-tests or spark-perf
       // later.
       test("SPARK-8406: Avoids name collision while writing files") {
    --- End diff --
    
    Just move this to ParquetHadoopFsRelationSuite.scala


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97068/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    **[Test build #97052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97052/testReport)** for PR 22643 at commit [`59ca9e0`](https://github.com/apache/spark/commit/59ca9e0f2fd6234217f63c25c41a477c4e435b50).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97052/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    **[Test build #97000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97000/testReport)** for PR 22643 at commit [`9a74db0`](https://github.com/apache/spark/commit/9a74db0195ca775878b5fec65fe38928c09c1792).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsR...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22643


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    LGTM
    
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    **[Test build #97052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97052/testReport)** for PR 22643 at commit [`59ca9e0`](https://github.com/apache/spark/commit/59ca9e0f2fd6234217f63c25c41a477c4e435b50).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22643: [SPARK-25630][TEST] Reduce test time of HadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22643
  
    **[Test build #97068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97068/testReport)** for PR 22643 at commit [`59ca9e0`](https://github.com/apache/spark/commit/59ca9e0f2fd6234217f63c25c41a477c4e435b50).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org