You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/28 04:40:00 UTC

[GitHub] [spark] beliefer opened a new pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

beliefer opened a new pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060
 
 
   ### What changes were proposed in this pull request?
   `SQLQueryTestSuite` spend 35 minutes time to test.
   I've listed the 10 test cases that took the longest time in the `SQL` module below.
   
   Class | Spend time  ↑ | Failure | Skip | Pass | Total test case
   -- | -- | -- | -- | -- | --
   SQLQueryTestSuite | 35 minutes | 0 | 1 | 230 | 231
   TPCDSQuerySuite | 3 minutes 8 seconds | 0 | 0 | 156 | 156
   SQLQuerySuite | 2 minutes 52 seconds | 0 | 0 | 185 | 185
   DynamicPartitionPruningSuiteAEOff | 1 minutes 52 seconds | 0 | 0 | 22 | 22
   DataFrameFunctionsSuite | 1 minutes 37 seconds | 0 | 0 | 102 | 102
   DynamicPartitionPruningSuiteAEOn | 1 minutes 24 seconds | 0 | 0 | 22 | 22
   DataFrameSuite | 1 minutes 14 seconds | 0 | 2 | 157 | 159
   SubquerySuite | 1 minutes 12 seconds | 0 | 1 | 70 | 71
   SingleLevelAggregateHashMapSuite | 1 minutes 1 seconds | 0 | 0 | 50 | 50
   DataFrameAggregateSuite | 59 seconds | 0 | 0 | 50 | 50
   
   I checked the code of `SQLQueryTestSuite` and found `SQLQueryTestSuite` load test data repeatedly.
   This PR will improve the performance of `SQLQueryTestSuite`.
   
   
   ### Why are the changes needed?
   Improve the performance of `SQLQueryTestSuite`.
   
   
   ### Does this PR introduce any user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   Jenkins test
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394840
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120521/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605400395
 
 
   **[Test build #120522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120522/testReport)** for PR 28060 at commit [`8eac8d6`](https://github.com/apache/spark/commit/8eac8d681a0737e19d44cf10192fae2285c78f47).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611491484
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611339913
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605449528
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605407311
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25231/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605399979
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25228/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405279243
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   Because some test case create a table or view who has the same name as the global temp view. The schema is different.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605407311
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25231/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611496100
 
 
   thanks, merging to master/3.0!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611117493
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120977/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394155
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25227/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605449354
 
 
   **[Test build #120530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120530/testReport)** for PR 28060 at commit [`c392c03`](https://github.com/apache/spark/commit/c392c03794bdcb1c8a2634d4c4c0e2ed398b9b38).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611490753
 
 
   **[Test build #121015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121015/testReport)** for PR 28060 at commit [`af42f50`](https://github.com/apache/spark/commit/af42f50c53aa9be5fa7540591fe2a6277357377c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605407168
 
 
   **[Test build #120525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120525/testReport)** for PR 28060 at commit [`8eac8d6`](https://github.com/apache/spark/commit/8eac8d681a0737e19d44cf10192fae2285c78f47).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611016031
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405965961
 
 

 ##########
 File path: sql/core/src/test/resources/sql-tests/results/limit.sql.out
 ##########
 @@ -7,8 +7,8 @@ SELECT * FROM testdata LIMIT 2
 -- !query schema
 struct<key:int,value:string>
 -- !query output
-1	1
-2	2
+51	51
 
 Review comment:
   can we add a sort before limit? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r404660864
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   @cloud-fan Thanks. I under @maropu 's suggestion now. I will try to use `createGlobalTempView` and shared these views between all sessions.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605449530
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25236/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605432950
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120525/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405596713
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   @cloud-fan @maropu I have used `df.write.saveAsTable` replace `df.createTempView`.
   Because the origin temp view changed to tables, I have to regenerate some golden files.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605406477
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

wangyum commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605406816
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605449530
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25236/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405524802
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   like this? https://github.com/apache/spark/compare/master...maropu:SPARK-31291

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605406479
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120522/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394050
 
 
   **[Test build #120521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120521/testReport)** for PR 28060 at commit [`0511d99`](https://github.com/apache/spark/commit/0511d9987a79eb9a18ece16f5ec6bf0168c18b6a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611339913
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405547300
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   > master...maropu:SPARK-31291
   
   This is a method but will cause too many changes. After a discussion offline between @cloud-fan and me, I will try to use `df.write.saveAsTable` replace `df.createTempView`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611374521
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25707/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605432946
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611491492
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121015/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611117479
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611341982
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25697/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605495372
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605407168
 
 
   **[Test build #120525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120525/testReport)** for PR 28060 at commit [`8eac8d6`](https://github.com/apache/spark/commit/8eac8d681a0737e19d44cf10192fae2285c78f47).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405967900
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -668,6 +690,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
     try {
       TimeZone.setDefault(originalTimeZone)
       Locale.setDefault(originalLocale)
+      unloadTestData(spark)
 
 Review comment:
   OK.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605756961
 
 
   Hi, All.
   I converted this issue into a subtask of SPARK-25604.
   FYI, technically, SPARK-25604 is already resolved by enhancing the test framework by parallelizing execution of the slow test suite and `SQLQueryTestSuite` is already parallelized in all SBT builds (including PRBuilder). So, this doesn't improve the total testing time in the SBT environment. The benefit of this PR is limited to only **Maven** environment.
   
   cc @gatorsmile 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394155
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25227/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611491492
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121015/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611117100
 
 
   **[Test build #120977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120977/testReport)** for PR 28060 at commit [`45bba4c`](https://github.com/apache/spark/commit/45bba4cb7cbbc9dc86e0d6e45c985c31d002a8e1).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605399977
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405966276
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -668,6 +690,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
     try {
       TimeZone.setDefault(originalTimeZone)
       Locale.setDefault(originalLocale)
+      unloadTestData(spark)
 
 Review comment:
   I'd prefer `createTestTables` and `removeTestTables`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405266677
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   > we must not use sparkSession.newSession().
   
   Why? All the `SparkSession`s share one `SharedState`, and the global temp views are stored in `SharedState`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605449354
 
 
   **[Test build #120530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120530/testReport)** for PR 28060 at commit [`c392c03`](https://github.com/apache/spark/commit/c392c03794bdcb1c8a2634d4c4c0e2ed398b9b38).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611374521
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25707/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gatorsmile commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

gatorsmile commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605761775
 
 
   This was assigned to @beliefer after our offline talk. He is trying to find out the reasons why SQLQueryTestSuite took 35 minutes to finish. The time costs of each step/phase can help us locate the root cause. It would be interesting to know whether our compiler overhead are too big for these short queries. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611373959
 
 
   **[Test build #121015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121015/testReport)** for PR 28060 at commit [`af42f50`](https://github.com/apache/spark/commit/af42f50c53aa9be5fa7540591fe2a6277357377c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605407308
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611117479
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394839
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611341499
 
 
   **[Test build #121005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121005/testReport)** for PR 28060 at commit [`af42f50`](https://github.com/apache/spark/commit/af42f50c53aa9be5fa7540591fe2a6277357377c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-612313246
 
 
   Hi, all.
   This seems to break all Maven Jenkins jobs in both `master` and `branch-3.0`. The following is the example.
   - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/211/
   
   Could you take a look?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611341982
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25697/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611367293
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121005/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-610129370
 
 
   cc @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605432723
 
 
   **[Test build #120525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120525/testReport)** for PR 28060 at commit [`8eac8d6`](https://github.com/apache/spark/commit/8eac8d681a0737e19d44cf10192fae2285c78f47).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-612313246
 
 
   Hi, all.
   This seems to break all Maven Jenkins jobs in both `master` and `branch-3.0`. The following is the example.
   - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/211/
   ```
   org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite *** ABORTED ***
   ```
   Could you take a look?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-612322442
 
 
   I made a follow-up PR.
   - https://github.com/apache/spark/pull/28186

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605449528
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611367162
 
 
   **[Test build #121005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121005/testReport)** for PR 28060 at commit [`af42f50`](https://github.com/apache/spark/commit/af42f50c53aa9be5fa7540591fe2a6277357377c).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-612317265
 
 
   The one quick fix is copying the test file from `jar:file:/home/jenkins/workspace/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/sql/core/target/spark-sql_2.12-3.0.1-SNAPSHOT-tests.jar!/test-data/postgresql/agg.data` to the local file.
   
   Since this PR is about the performance, the fix will increase the test time a little for copying.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611339919
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25694/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611367288
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405319940
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   ```
   val a = spark.sql("show views;")
   a.show();
   ```
   +---------+---------+-----------+
   |namespace| viewName|isTemporary|
   +---------+---------+-----------+
   |         |  aggtest|       true|
   |         |arraydata|       true|
   |         |  mapdata|       true|
   |         |     onek|       true|
   |         |    tenk1|       true|
   +---------+---------+-----------+
   
   ```
   val a2 = localSparkSession.sql("show views;")
   a2.show();
   ```
   +---------+--------+-----------+
   |namespace|viewName|isTemporary|
   +---------+--------+-----------+
   +---------+--------+-----------+
   Maybe I lost some thing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394835
 
 
   **[Test build #120521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120521/testReport)** for PR 28060 at commit [`0511d99`](https://github.com/apache/spark/commit/0511d9987a79eb9a18ece16f5ec6bf0168c18b6a).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r399898528
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   To avoid the overhead of per-session init, we cannot just move these local temp views into a session-independent place, e.g., global temp views?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605756961
 
 
   Hi, All.
   I converted this issue into a subtask of SPARK-25604.
   FYI, technically, SPARK-25604 is already resolved by enhancing the test framework by parallelizing execution of the slow test suite and `SQLQueryTestSuite` and `ThriftServerQueryTestSuite.scala` are already parallelized in all SBT build and tests (including PRBuilder). So, this doesn't improve the total testing time in the SBT environment at all. The benefit of this PR is limited to only **Maven** environment.
   
   cc @gatorsmile 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611374512
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605495065
 
 
   **[Test build #120530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120530/testReport)** for PR 28060 at commit [`c392c03`](https://github.com/apache/spark/commit/c392c03794bdcb1c8a2634d4c4c0e2ed398b9b38).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394050
 
 
   **[Test build #120521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120521/testReport)** for PR 28060 at commit [`0511d99`](https://github.com/apache/spark/commit/0511d9987a79eb9a18ece16f5ec6bf0168c18b6a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405285989
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   @cloud-fan 
   If use `sparkSession.newSession()`, test case failed. such as:
   ```
   23:25:05.078 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: 
   [info] - operators.sql *** FAILED *** (5 seconds, 722 milliseconds)
   [info]   operators.sql
   [info]   Expected "struct<[(- key):int,(+ key):int]>", but got "struct<[]>" Schema did not match for query #4
   [info]   select -key, +key from testdata where key = 2: -- !query
   [info]   select -key, +key from testdata where key = 2
   [info]   -- !query schema
   [info]   struct<>
   [info]   -- !query output
   [info]   org.apache.spark.sql.AnalysisException
   [info]   Table or view not found: testdata; line 1 pos 23 (SQLQueryTestSuite.scala:464)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611343954
 
 
   the change LGTM, can you regenerate the benchmark numbers?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605406428
 
 
   **[Test build #120522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120522/testReport)** for PR 28060 at commit [`8eac8d6`](https://github.com/apache/spark/commit/8eac8d681a0737e19d44cf10192fae2285c78f47).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394840
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120521/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605495374
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120530/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605400395
 
 
   **[Test build #120522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120522/testReport)** for PR 28060 at commit [`8eac8d6`](https://github.com/apache/spark/commit/8eac8d681a0737e19d44cf10192fae2285c78f47).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605399979
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25228/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611373959
 
 
   **[Test build #121015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121015/testReport)** for PR 28060 at commit [`af42f50`](https://github.com/apache/spark/commit/af42f50c53aa9be5fa7540591fe2a6277357377c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394839
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611016043
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25668/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405285989
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   @cloud-fan 
   When use `sparkSession.newSession()`, will failed. such as:
   ```
   23:25:05.078 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: 
   [info] - operators.sql *** FAILED *** (5 seconds, 722 milliseconds)
   [info]   operators.sql
   [info]   Expected "struct<[(- key):int,(+ key):int]>", but got "struct<[]>" Schema did not match for query #4
   [info]   select -key, +key from testdata where key = 2: -- !query
   [info]   select -key, +key from testdata where key = 2
   [info]   -- !query schema
   [info]   struct<>
   [info]   -- !query output
   [info]   org.apache.spark.sql.AnalysisException
   [info]   Table or view not found: testdata; line 1 pos 23 (SQLQueryTestSuite.scala:464)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605495374
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120530/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605432950
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120525/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405968827
 
 

 ##########
 File path: sql/core/src/test/resources/sql-tests/results/limit.sql.out
 ##########
 @@ -7,8 +7,8 @@ SELECT * FROM testdata LIMIT 2
 -- !query schema
 struct<key:int,value:string>
 -- !query output
-1	1
-2	2
+51	51
 
 Review comment:
   I use repartition resolved the issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r399918579
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   The key point of conflict is not this. For example, test case A will create view testdata, and test case B will also create view testdata. However, the schema information of the two testdata is different. If the same session is shared globally, it will cause conflicts, especially in parallel execution.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

cloud-fan closed pull request #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611547251
 
 
   @cloud-fan @maropu Thanks for review this PR.
   @dongjoon-hyun @gatorsmile @wangyum Thanks for your help!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611117493
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120977/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405279243
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   Because some test case create a table or view who has the same name as the global temp view. The schema is different.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405257472
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   @cloud-fan @maropu 
   I tried use `createGlobalTempView` but failed.
   If we want share the global temp view, we must not use `sparkSession.newSession()`.
   Because every test case use a new session by `sparkSession.newSession()`.
   If all the test case use the same `sparkSession`, another issue will occur. such as:
   test case A create view testdata, and test case B will also create view testdata. However, the schema information of the two testdata is different. If the same session is shared globally, it will cause conflicts, especially in parallel execution.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611367293
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121005/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605432946
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

maropu commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605410032
 
 
   Looks cool, thanks for the work, @beliefer ! btw, how long will `SQLQueryTestSuite` take with this fix? I just want to know the total running time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611016031
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611341971
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-611491484
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611371394
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611374512
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605495372
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611367288
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#discussion_r407011922
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -572,30 +571,40 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def createTestTables(session: SparkSession): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    (1 to 100).map(i => (i, i.toString)).toDF("key", "value")
+      .repartition(1)
+      .write
+      .format("parquet")
+      .saveAsTable("testdata")
 
     ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
       .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
+      .write
+      .format("parquet")
+      .saveAsTable("arraydata")
 
     (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
       Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
       Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
       Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
       Tuple1(Map(1 -> "a5")) :: Nil)
       .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
+      .write
+      .format("parquet")
+      .saveAsTable("mapdata")
 
     session
       .read
       .format("csv")
       .options(Map("delimiter" -> "\t", "header" -> "false"))
       .schema("a int, b float")
       .load(testFile("test-data/postgresql/agg.data"))
 
 Review comment:
   This seems to be the root cause of failure.
   ```
    java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/home/jenkins/workspace/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/sql/core/target/spark-sql_2.12-3.0.1-SNAPSHOT-tests.jar!/test-data/postgresql/agg.data
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-612315314
 
 
   Since I found the root cause, I'll make a follow-up PR soon.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611015204
 
 
   **[Test build #120977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120977/testReport)** for PR 28060 at commit [`45bba4c`](https://github.com/apache/spark/commit/45bba4cb7cbbc9dc86e0d6e45c985c31d002a8e1).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394152
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605406477
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605394152
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605407308
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611015204
 
 
   **[Test build #120977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120977/testReport)** for PR 28060 at commit [`45bba4c`](https://github.com/apache/spark/commit/45bba4cb7cbbc9dc86e0d6e45c985c31d002a8e1).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405257472
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   @cloud-fan @maropu 
   I tried use `createGlobalTempView` but failed.
   If we want share the global temp view, we must not use `sparkSession.newSession()`.
   Because every test case use a new session by `sparkSession.newSession()`.
   If all the test case use the same `sparkSession`, another issue will occur. such as:
   test case A create view testdata, and test case B will also create view testdata. However, the schema information of the two testdata is different. If the same session is shared globally, it will cause conflicts, especially in parallel execution.
   `SQLQueryTestSuite` is designed to isolate each test case so that test cases will not affect each other. `SQLQueryTestSuite` uses `sparkSession.newSession()`.
   If we really want share `sparkSession` and the global temp view, we must to resolve these conflict and modify a large number of test SQL.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605756961
 
 
   Hi, All.
   I converted this issue into a subtask of SPARK-25604.
   FYI, technically, SPARK-25604 is already resolved by enhancing the test framework by parallelizing execution of the slow test suite and `SQLQueryTestSuite` and `ThriftServerQueryTestSuite.scala` are already parallelized in all SBT build and tests (including PRBuilder). So, this doesn't improve the total testing time in the SBT environment at all. The benefit of this PR is limited to only **Maven** environment.
   
   cc @gatorsmile and @gengliangwang 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r404590023
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   > If the same session is shared globally
   
   I think that's not what @maropu means. We still create a fresh session for each testing file, but the testing views are created as global temp view, which are shared between all sessions.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611341499
 
 
   **[Test build #121005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121005/testReport)** for PR 28060 at commit [`af42f50`](https://github.com/apache/spark/commit/af42f50c53aa9be5fa7540591fe2a6277357377c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-608193938
 
 
   @dongjoon-hyun I think even in parallel execution, this PR will still help.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611339919
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25694/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on issue #28060: [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases
URL: https://github.com/apache/spark/pull/28060#issuecomment-612322442
 
 
   I made a follow-up PR to recover `master` and `branch-3.0`.
   - https://github.com/apache/spark/pull/28186

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611347876
 
 
   > the change LGTM, can you regenerate the benchmark numbers?
   
   OK.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611016043
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25668/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r404672548
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   Sorry, but I missed your reply, @beliefer. Yea, that's what I wanted to say, thanks, @cloud-fan . I'll check later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-611341971
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605447533
 
 
   > Looks cool, thanks for the work, @beliefer ! btw, how long will `SQLQueryTestSuite` take with this fix? I just want to know the total running time.
   
   The total time after optimization is less than that before optimization by nearly one minute.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605406479
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120522/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

beliefer commented on a change in pull request #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#discussion_r405319940
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##########
 @@ -570,82 +579,94 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession {
   }
 
   /** Load built-in test tables into the SparkSession. */
-  private def loadTestData(session: SparkSession): Unit = {
+  private def loadTestData(session: SparkSession, testTables: Seq[String]): Unit = {
     import session.implicits._
 
-    (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
-
-    ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
-      .toDF("arraycol", "nestedarraycol")
-      .createOrReplaceTempView("arraydata")
-
-    (Tuple1(Map(1 -> "a1", 2 -> "b1", 3 -> "c1", 4 -> "d1", 5 -> "e1")) ::
-      Tuple1(Map(1 -> "a2", 2 -> "b2", 3 -> "c2", 4 -> "d2")) ::
-      Tuple1(Map(1 -> "a3", 2 -> "b3", 3 -> "c3")) ::
-      Tuple1(Map(1 -> "a4", 2 -> "b4")) ::
-      Tuple1(Map(1 -> "a5")) :: Nil)
-      .toDF("mapcol")
-      .createOrReplaceTempView("mapdata")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema("a int, b float")
-      .load(testFile("test-data/postgresql/agg.data"))
-      .createOrReplaceTempView("aggtest")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/onek.data"))
-      .createOrReplaceTempView("onek")
-
-    session
-      .read
-      .format("csv")
-      .options(Map("delimiter" -> "\t", "header" -> "false"))
-      .schema(
-        """
-          |unique1 int,
-          |unique2 int,
-          |two int,
-          |four int,
-          |ten int,
-          |twenty int,
-          |hundred int,
-          |thousand int,
-          |twothousand int,
-          |fivethous int,
-          |tenthous int,
-          |odd int,
-          |even int,
-          |stringu1 string,
-          |stringu2 string,
-          |string4 string
-        """.stripMargin)
-      .load(testFile("test-data/postgresql/tenk.data"))
-      .createOrReplaceTempView("tenk1")
+    if (testTables.contains("testdata")) {
+      (1 to 100).map(i => (i, i.toString)).toDF("key", "value").createOrReplaceTempView("testdata")
+    }
+
+    if (testTables.contains("arraydata")) {
+      ((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil)
+        .toDF("arraycol", "nestedarraycol")
+        .createOrReplaceTempView("arraydata")
 
 Review comment:
   ```
   val a = spark.sql("show views;")
   a.show();
   ```
   +---------+---------+-----------+
   |namespace| viewName|isTemporary|
   +---------+---------+-----------+
   |         |  aggtest|       true|
   |         |arraydata|       true|
   |         |  mapdata|       true|
   |         |     onek|       true|
   |         |    tenk1|       true|
   +---------+---------+-----------+
   
   ```
   val localSparkSession = spark.newSession()
   val a2 = localSparkSession.sql("show views;")
   a2.show();
   ```
   +---------+--------+-----------+
   |namespace|viewName|isTemporary|
   +---------+--------+-----------+
   +---------+--------+-----------+
   Maybe I lost some thing?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on issue #28060: [SPARK-31291][SQL][TEST] Avoid load test data if test case not uses them
URL: https://github.com/apache/spark/pull/28060#issuecomment-605399977
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org