You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/01 05:38:32 UTC

[GitHub] [spark] LuciferYang opened a new pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

LuciferYang opened a new pull request #29608:
URL: https://github.com/apache/spark/pull/29608


   ### What changes were proposed in this pull request?
   `sql-expression-schema.md` automatically generated by ExpressionsSchemaSuite, but only expressions entry are checked in `ExpressionsSchemaSuite`. So if we manually modify the contents of the file,  `ExpressionsSchemaSuite` does not necessarily guarantee the correctness of the it. For example, Spark-24884 added `regexp_extract_all`  expression support, and manually modify the `sql-expression-schema.md` but not change the content of `Number of queries` cause file content inconsistency.
   
   Some checks have been added to `ExpressionsSchemaSuite` to improve the correctness guarantee of `sql-expression-schema.md` as follow:
   
   - `Number of queries` should equals size of `expressions entries`
   
   - `Number of expressions that missing example` should equals size of `Expressions missing examples`
   
   - `MissExamples` from case should same as  `expectedMissExamples` from `sql-expression-schema.md`
   
   ### Why are the changes needed?
   Ensure the correctness of `sql-expression-schema.md` content.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Enhanced 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482821868



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       I got it. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r481152252



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,7 +152,7 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissExamples, expectedOutputs): (Array[String], Seq[QueryOutput]) = {

Review comment:
       No big deal, but the types can just be added to the two tuple elements, instead of declaring them separately as a type for the whole tuple

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
         s"Expected $expectedSize blocks in result file but got " +
           s"${outputSize + headerSize}. Try regenerate the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt

Review comment:
       miss -> missed or missing, in most cases in this change

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
         s"Expected $expectedSize blocks in result file but got " +
           s"${outputSize + headerSize}. Try regenerate the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt
+      val missExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,
+        s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
+          "record in result file. Try regenerate the result files.")

Review comment:
       regenerate -> regenerating




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686375088


   **[Test build #128249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128249/testReport)** for PR 29608 at commit [`6e6489b`](https://github.com/apache/spark/commit/6e6489bff1291cf5d409b04b7dd21e56daf88e3c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482794951



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       > You can try to mock a new function with comments and test this suite.
   
   You are right,  [Spark-24884](https://github.com/apache/spark/pull/27507) already trigger this problem cause by incomplete manual modification.
   
   With the new function add scene, I think the right way is run this case with `SPARK_GENERATE_GOLDEN_FILES = 1` to automatically regenerate the correct `sql-expression-schema.md` becasuse `sql-expression-schema.md` header said `Automatically generated by ExpressionsSchemaSuite`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684846608


   **[Test build #128142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128142/testReport)** for PR 29608 at commit [`6d36603`](https://github.com/apache/spark/commit/6d366031369998c77fffb787b920d66aa25a343f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686563956


   Thx ~ @maropu 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685275797


   **[Test build #128178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128178/testReport)** for PR 29608 at commit [`bf0ceea`](https://github.com/apache/spark/commit/bf0ceea95e42cb510120cea4632c8fc350d36315).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685395711


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686217155


   **[Test build #128222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128222/testReport)** for PR 29608 at commit [`85299b1`](https://github.com/apache/spark/commit/85299b15d0ab1b2e7d3f6341394af50f16f6b812).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482669992



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExample = lines(3).split(":")(1).trim.toInt

Review comment:
       done ~




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686299387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686215449






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686299295






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684576498


   **[Test build #128142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128142/testReport)** for PR 29608 at commit [`6d36603`](https://github.com/apache/spark/commit/6d366031369998c77fffb787b920d66aa25a343f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684578476






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482728485



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       ~`numberOfQueries == outputSize`~
   Ah, I got it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482597152



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExample = lines(3).split(":")(1).trim.toInt

Review comment:
       nit: `Example` -> `Examples`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686375088


   **[Test build #128249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128249/testReport)** for PR 29608 at commit [`6e6489b`](https://github.com/apache/spark/commit/6e6489bff1291cf5d409b04b7dd21e56daf88e3c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684848015






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685254629






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r481579451



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
         s"Expected $expectedSize blocks in result file but got " +
           s"${outputSize + headerSize}. Try regenerate the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt

Review comment:
       ok ~




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686406140


   Thx for your review @maropu @srowen @beliefer ~ 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686300145


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128221/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685393711


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128176/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686294015


   **[Test build #128236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128236/testReport)** for PR 29608 at commit [`5a574c3`](https://github.com/apache/spark/commit/5a574c36f2b6a7d0730857c64ba587aaa5c297cb).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482844352



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       Address [6e6489b](https://github.com/apache/spark/pull/29608/commits/6e6489bff1291cf5d409b04b7dd21e56daf88e3c) reorder the assertions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686213653






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685254629


   **[Test build #128176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128176/testReport)** for PR 29608 at commit [`5c8e29e`](https://github.com/apache/spark/commit/5c8e29eca30e32d5371fa4339db8eb9e0864c16f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684578476






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685254989






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482780532



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       You can try to mock a new function with comments and test.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686300139






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686294650






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684344948


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686512873


   @maropu ` org.apache.spark.sql.hive.thriftserver.CliSuite.*` failed...  I think It doesn't seem to be caused by this pr, can you help trigger retest?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482729986



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,
+        s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
+          "record in result file. Try regenerating the result files.")
+      assert(numberOfMissingExamples == missingExamples.size,
+        s"missing examples size: ${missingExamples.size} not same as " +
+          s"numberOfMissingExamples: $numberOfMissingExamples " +
+          "record in result file. Try regenerating the result files.")
+
+      (missingExamples, expectedOutputs)

Review comment:
       ditto

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")

Review comment:
       `expectedMissingExamples`

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       `numberOfQueries == outputSize`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482794951



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       > You can try to mock a new function with comments and test this suite.
   
   You are right,  [Spark-24884](https://github.com/apache/spark/pull/27507) already trigger this problem cause by incomplete manual modification.
   
   With a a new function add scene, I think the right way is run this case with `SPARK_GENERATE_GOLDEN_FILES = 1` to automatically regenerate the correct `sql-expression-schema.md`, becasuse `sql-expression-schema.md` header said `Automatically generated by ExpressionsSchemaSuite`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686835108


   Thanks! Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685256865


   @srowen [5c8e29e](https://github.com/apache/spark/pull/29608/commits/5c8e29eca30e32d5371fa4339db8eb9e0864c16f) resolved the commets


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685393681


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482804980



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       Therefore, if the file is manually modified instead of automatically generated,  I think the assertion failure caused by incorrect modification should be expected.
   
   Do we need to tell users clearly with `Try regenerating the result files with sys env  SPARK_GENERATE_GOLDEN_FILES = 1`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482794951



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       With a a new function add scene, the right way is run this case with `SPARK_GENERATE_GOLDEN_FILES = 1` to automatically regenerate the correct `sql-expression-schema.md`.
   
   The `sql-expression-schema.md` header said ` Automatically generated by ExpressionsSchemaSuite`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684567459


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482752434



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       @beliefer This assert place here to verify the consistency of the file contents to avoid inconsistency caused by manual modification, and I think line 189  `assert expectedOutputs.size == outputSize` achieves the same goal as `numberOfQueries == outputSize`. Is this acceptable?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu closed pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
maropu closed pull request #29608:
URL: https://github.com/apache/spark/pull/29608


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685393681






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686219162






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686500830


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684348043


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685390611






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482821868



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       I got it. Thanks! Could you put these assert on line 163, so that it looks clear




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686215449






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686500845


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128249/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684344948


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685395741


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128178/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686500830






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685254989






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686281798


   Good catch! Except for some minor issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685256865


   @srowen [5c8e29e](https://github.com/apache/spark/pull/29608/commits/5c8e29eca30e32d5371fa4339db8eb9e0864c16f) and [bf0ceea](https://github.com/apache/spark/pull/29608/commits/bf0ceea95e42cb510120cea4632c8fc350d36315) the commets


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686375753






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686299442






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686213653


   **[Test build #128221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128221/testReport)** for PR 29608 at commit [`cbb0fcd`](https://github.com/apache/spark/commit/cbb0fcdca64bc498fb81f6619af43f2f81987aed).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686228668


   The failed check cause by ` Install R linter dependencies and SparkR`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684848015






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482794951



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       > You can try to mock a new function with comments and test this suite.
   
   You are right,  [Spark-24884](https://github.com/apache/spark/pull/27507) already trigger this problem cause by incomplete manual modification.
   
   With the new function add scene, I think the right way is run this case with `SPARK_GENERATE_GOLDEN_FILES = 1` to automatically regenerate the correct `sql-expression-schema.md`, becasuse `sql-expression-schema.md` header said `Automatically generated by ExpressionsSchemaSuite`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r481582101



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,7 +152,7 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissExamples, expectedOutputs): (Array[String], Seq[QueryOutput]) = {

Review comment:
       First changed to `val (expectedMissingExamples: Array[String], expectedOutputs: Seq[QueryOutput])`, but I feel the type declaration here is redundant, so changed to `val (expectedMissingExamples, expectedOutputs)`. Is this acceptable?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482742258



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,
+        s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
+          "record in result file. Try regenerating the result files.")
+      assert(numberOfMissingExamples == missingExamples.size,
+        s"missing examples size: ${missingExamples.size} not same as " +
+          s"numberOfMissingExamples: $numberOfMissingExamples " +
+          "record in result file. Try regenerating the result files.")
+
+      (missingExamples, expectedOutputs)

Review comment:
       done ~




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r481582101



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,7 +152,7 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissExamples, expectedOutputs): (Array[String], Seq[QueryOutput]) = {

Review comment:
       First changed to `val (expectedMissingExamples: Array[String], expectedOutputs: Seq[QueryOutput])`, but I feel that the type declaration here is redundant, so changed to `val (expectedMissingExamples, expectedOutputs)`. Is this acceptable?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684576498


   **[Test build #128142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128142/testReport)** for PR 29608 at commit [`6d36603`](https://github.com/apache/spark/commit/6d366031369998c77fffb787b920d66aa25a343f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686529387


   > @maropu org.apache.spark.sql.hive.thriftserver.CliSuite.* failed... I think It doesn't seem to be caused by this pr, can you help trigger retest?
   
   All the tests in GitHub Actions passed, so this PR looks fine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r481578875



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
         s"Expected $expectedSize blocks in result file but got " +
           s"${outputSize + headerSize}. Try regenerate the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt
+      val missExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,
+        s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
+          "record in result file. Try regenerate the result files.")

Review comment:
       done ~

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
         s"Expected $expectedSize blocks in result file but got " +
           s"${outputSize + headerSize}. Try regenerate the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt

Review comment:
       done ~




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482741930



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")

Review comment:
       done ~




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686375753






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686400269


   LGTM!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685276175






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686299387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-684348043


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r481578827



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
         s"Expected $expectedSize blocks in result file but got " +
           s"${outputSize + headerSize}. Try regenerate the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt
+      val missExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,
+        s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
+          "record in result file. Try regenerate the result files.")

Review comment:
       ok ~

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
         s"Expected $expectedSize blocks in result file but got " +
           s"${outputSize + headerSize}. Try regenerate the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt
+      val missExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,
+        s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
+          "record in result file. Try regenerate the result files.")

Review comment:
       ok ~




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685276175






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482728485



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       ~`numberOfQueries == outputSize`~
   Ah, I got it. But if a new function here, `expectedOutputs.size` must not equal to `numberOfQueries`.
   `expectedOutputs.size == outputSize` in fact.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482727400



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       I think should put  these assert on line 163




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686500048


   **[Test build #128249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128249/testReport)** for PR 29608 at commit [`6e6489b`](https://github.com/apache/spark/commit/6e6489bff1291cf5d409b04b7dd21e56daf88e3c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on a change in pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
beliefer commented on a change in pull request #29608:
URL: https://github.com/apache/spark/pull/29608#discussion_r482780532



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##########
@@ -152,23 +152,37 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
 
     val outputSize = outputs.size
     val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissingExamples, expectedOutputs) = {
       val expectedGoldenOutput = fileToString(resultFile)
       val lines = expectedGoldenOutput.split("\n")
       val expectedSize = lines.size
 
       assert(expectedSize == outputSize + headerSize,
         s"Expected $expectedSize blocks in result file but got " +
-          s"${outputSize + headerSize}. Try regenerate the result files.")
+          s"${outputSize + headerSize}. Try regenerating the result files.")
 
-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")
+      val expectedOutputs = Seq.tabulate(outputSize) { i =>
         val segments = lines(i + headerSize).split('|')
         QueryOutput(
           className = segments(1).trim,
           funcName = segments(2).trim,
           sql = segments(3).trim,
           schema = segments(4).trim)
       }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,

Review comment:
       You can try to mock a new function with comments and test this suite.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-685395711






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686294650






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29608: [SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29608:
URL: https://github.com/apache/spark/pull/29608#issuecomment-686219162






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org