You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2016/07/17 01:42:55 UTC

[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/14235

    [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite to check generated SQL directly

    ## What changes were proposed in this pull request?
    
    This issue improves `LogicalPlanToSQLSuite` to check the generated SQL directly by **structure**. So far, `LogicalPlanToSQLSuite` relies on  `checkHiveQl` to ensure the **successful SQL generation** and **answer equality**. However, it does not guarantee the generated SQL is the same or will not be changed unnoticeabley.
    
    The following is an example result of this issue. 
    ```scala
    -    checkHiveQl("SELECT * FROM parquet_t0 TABLESAMPLE(0.1 PERCENT) WHERE 1=0")
    +    checkHiveQl("SELECT * FROM parquet_t0 TABLESAMPLE(0.1 PERCENT) WHERE 1=0",
    +      """
    +        |SELECT `gen_attr` AS `id`
    +        |FROM (SELECT `gen_attr`
    +        |      FROM (SELECT `id` AS `gen_attr`
    +        |            FROM `default`.`parquet_t0`
    +        |            TABLESAMPLE(0.1 PERCENT))
    +        |            AS gen_subquery_0
    +        |      WHERE (1 = 0))
    +        |      AS parquet_t0
    +      """.stripMargin)
    ```
    
    
    ## How was this patch tested?
    
    Pass the Jenkins. This is only a testsuite change.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-16590

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14235
    
----
commit a3bb306af576834d93394d4ac636c5b52b6d745a
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-07-17T01:39:18Z

    [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to check generated SQL directly

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71092452
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,7 +85,34 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkHiveQl(hiveQl: String, answerFile: String = null): Unit = {
    --- End diff --
    
    Oh, it's due to that I found that some Windows queries generated SQL is non-deterministic. The order in the windows functions varies sometime. As a result, the test sometime passes and sometime not. Until now, those testcases are not transformed into this style in this PR.
    I'm planing to investigate the root cause later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62450/consoleFull)** for PR 14235 at commit [`a9a1b00`](https://github.com/apache/spark/commit/a9a1b00411fa6175dd1e628a3485a4313999b636).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    @dongjoon-hyun can you update the classdoc for the test suite to explain how to generate the expected results?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    `BROADCAST hints` is special. We need to check the statements.
    
    My concern is we might spend a lot of times when making changes that impact the SQL statement strings but it does not change the correctness of SQL builder. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094916
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    No problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    @dongjoon-hyun i'm not against merging this first. Just want to see if we can improve it further.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71092375
  
    --- Diff: sql/hive/src/test/resources/sqlgen/agg1.sql ---
    @@ -0,0 +1,3 @@
    +SELECT COUNT(value) FROM parquet_t1 GROUP BY key HAVING MAX(key) > 0
    +--------------------------------------------------------------------------------
    +SELECT `gen_attr` AS `count(value)` FROM (SELECT `gen_attr` FROM (SELECT count(`gen_attr`) AS `gen_attr`, max(`gen_attr`) AS `gen_attr` FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS gen_subquery_0 GROUP BY `gen_attr` HAVING (`gen_attr` > CAST(0 AS BIGINT))) AS gen_subquery_1) AS gen_subquery_2
    --- End diff --
    
    Oh, I see. Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Now, the answer files have both original and generated queries and the description of regenerating answer sets is added to `LogicalPlanToSQLSuite` classdoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    If I miss some cost you concern, could you give some examples? I'm still not sure what you mean by cost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62444/consoleFull)** for PR 14235 at commit [`6df0c18`](https://github.com/apache/spark/commit/6df0c187245f1329de2732a844d6edede9440fb4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71100943
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -101,95 +149,125 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
                |# Converted SQL query string:
                |$convertedSQL
                |
    -           |# Original HiveQL query string:
    -           |$hiveQl
    +           |# Original SQL query string:
    +           |$sqlString
                |
                |# Resolved query plan:
                |${df.queryExecution.analyzed.treeString}
              """.stripMargin, cause)
         }
       }
     
    +  test("checkSQL") {
    --- End diff --
    
    what i meant about inline comment was something like this
    
    ```
    test("test behavior for the golden file test harness") {
      // 1. Test should fail if the SQL query cannot be parsed.
      val m = intercept[ParseException] {
        ...
      }
      assert(...)
    
      // 2. Test should fail if the golden file cannot be found
      ...
    
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62446/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71101452
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    What I mean is `sqlgen` is currently `sql/hive/src/test/resources/sqlgen`. So, `sql/hive/src/test/resources/sqlgen/README.md` is also copied like the other resources into `test-classes`. Probably, did I miss something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71111037
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,15 +17,32 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, NoSuchFileException, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
    +import org.apache.spark.sql.catalyst.parser.ParseException
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A test suite for LogicalPlan-to-SQL conversion.
    + *
    + * Each query has a golden generated SQL file in test/resources/sqlgen. The test suite also has
    + * built-in functionality to automatically generate these golden files.
    + *
    + * To re-generate golden files, run:
    + *    SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite"
    + */
     class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
       import testImplicits._
     
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = System.getenv("SPARK_GENERATE_GOLDEN_FILES") != null
    --- End diff --
    
    hm if it is 0 we probably shouldn't be running it, so i'd test the value against 1 
    
    ```
    Option(System.getenv("SPARK_GENERATE_GOLDEN_FILES")) == Some("1")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71101786
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    For this one, here is the code and the result in IntelliJ.
    ```
    val path = s"src/test/resources/sqlgen/$answerFile.sql"
    val path2 = getClass.getClassLoader.getResource("sqlgen/README.md")
    ```
    ```
    file:/Users/dongjoon/SPARK-CHECK-GEN-SQL/sql/hive/target/scala-2.11/test-classes/sqlgen/README.md
    ```
    Reading resource files are not problem, but we need the src directory for saving new golden SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095897
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    For this one, it seems we need to use that path when we save.
    When `getClass.getClassLoader.getResource` returns `test-classes/sqlgen`.
    Is it okay to keep that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094865
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    use classloader getResource to get the file, rather than hardcoding the path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    @dongjoon-hyun can you also look into having stable identifiers for gen_attr? Right now the golden files look really weird because gen_attr is used more than once.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71102449
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62486/consoleFull)** for PR 14235 at commit [`244d013`](https://github.com/apache/spark/commit/244d01369a896e187d1d1078fb0d1f5e21ceb8f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Sure. I've been looking that. It's on my list.
    I'll make a JIRA issue and proceed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71100858
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +88,58 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val header = "-- This file is automatically generated by LogicalPlanToSQLSuite."
    +        val answerText = s"$header\n${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    --- End diff --
    
    shouldn't we do an assert here instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094901
  
    --- Diff: sql/hive/src/test/resources/sqlgen/agg1.sql ---
    @@ -0,0 +1,3 @@
    +SELECT COUNT(value) FROM parquet_t1 GROUP BY key HAVING MAX(key) > 0
    --- End diff --
    
    one more thing -- can the generated file actually say
    
    ```
    -- This file is automatically generated by LogicalPlanToSQLSuite
    query
    --------------------------------------------------------------------------------
    generated query
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71102107
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    Ah ic. My worry is that Maven and SBT has base path where the tests are running, and the test will fail in Maven.
    
    How about we define the output directory as a variable in the beginning of the suite, and then use that when we write output out? For reading input we simply use "sqlgen/$answerFile.sql"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Can you also remove "[TEST]" from the title? TEST isn't a module.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62493/consoleFull)** for PR 14235 at commit [`efaa4d0`](https://github.com/apache/spark/commit/efaa4d0d55373280e19ed38b7e192545e4a3a6af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62420/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095116
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -102,7 +140,7 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
                |$convertedSQL
                |
                |# Original HiveQL query string:
    --- End diff --
    
    Oh. Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62486/consoleFull)** for PR 14235 at commit [`244d013`](https://github.com/apache/spark/commit/244d01369a896e187d1d1078fb0d1f5e21ceb8f9).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62448/consoleFull)** for PR 14235 at commit [`38f52ce`](https://github.com/apache/spark/commit/38f52cea6fbd53cf66429d396663708afcb1f193).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71101205
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    What do you mean? What I meant is to create a README.md file in sqlgen directory, and at runtime we can get the location of sqlgen directory by getResource("sqlgen/README.md")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Thanks - merging in master / 2.0.
    
    I'm also merging this in 2.0 since it is a test only change and will reduce merge conflicts.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71100872
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -101,95 +149,125 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
                |# Converted SQL query string:
                |$convertedSQL
                |
    -           |# Original HiveQL query string:
    -           |$hiveQl
    +           |# Original SQL query string:
    +           |$sqlString
                |
                |# Resolved query plan:
                |${df.queryExecution.analyzed.treeString}
              """.stripMargin, cause)
         }
       }
     
    +  test("checkSQL") {
    --- End diff --
    
    add some comment inline explaining what we are actually testing here - also maybe rename the test name to "test behavior for the golden file test harness"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62469/consoleFull)** for PR 14235 at commit [`efaa4d0`](https://github.com/apache/spark/commit/efaa4d0d55373280e19ed38b7e192545e4a3a6af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094954
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkSQL(sqlString: String, answerFile: String = null): Unit = {
    --- End diff --
    
    It'd be great to document what this function does too, e.g. 
    
    1. Checks generated SQL against golden files.
    2. Verifies the execution result stays the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095404
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkSQL(sqlString: String, answerFile: String = null): Unit = {
    --- End diff --
    
    I added like this. Is it okay for you?
    ```scala
      /**
       * 1. Checks if SQL parsing succeed.
       * 2. Checks if SQL generation succeed.
       * 3. Checks the generated SQL against golden files.
       * 4. Verifies the execution result stays the same.
       */
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Hmm, I didn't remember correctly, but the result were different. For the Hint test cases, I used SQLConf to setup the testsuite because I knew that the optimizer would work differently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Yeah, we need to do it for `Parser`. That part is very independent. The changes on the other parts will not affect it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62444/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62493/consoleFull)** for PR 14235 at commit [`efaa4d0`](https://github.com/apache/spark/commit/efaa4d0d55373280e19ed38b7e192545e4a3a6af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71257369
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,15 +17,33 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, NoSuchFileException, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
    +import org.apache.spark.sql.catalyst.parser.ParseException
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A test suite for LogicalPlan-to-SQL conversion.
    + *
    + * Each query has a golden generated SQL file in test/resources/sqlgen. The test suite also has
    + * built-in functionality to automatically generate these golden files.
    + *
    + * To re-generate golden files, run:
    + *    SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite"
    + */
     class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
       import testImplicits._
     
    +  // Used for generating new query answer files by saving
    +  private val regenerateGoldenFiles =
    +    Option(System.getenv("SPARK_GENERATE_GOLDEN_FILES")).contains("1")
    --- End diff --
    
    Why did you use contains here? This is super confusing and also broke 2.10.
    
    I think I asked to do comparison with Some("1"). In most cases it is a very bad idea to use collection-oriented methods on Options, because they make the code more confusing.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094847
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    --- End diff --
    
    I'd say we should not include this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71101148
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    Ur, I think any resource files under `resources` are copied into `test-classes` with classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62420/consoleFull)** for PR 14235 at commit [`a3bb306`](https://github.com/apache/spark/commit/a3bb306af576834d93394d4ac636c5b52b6d745a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    It sounds like the latest changes are unable to resolve Reynold's concern: how to regenerate all the expected SQL queries in bulk? 
    
    You know, for native view support, we are not generating optimal SQL statements. You can see the generated SQL is verbose and not readable. However, these SQL statements can be optimized by Catalyst optimizer at runtime. Thus, I have the same concern. This approach is not maintainable 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71101198
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -101,95 +149,125 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
                |# Converted SQL query string:
                |$convertedSQL
                |
    -           |# Original HiveQL query string:
    -           |$hiveQl
    +           |# Original SQL query string:
    +           |$sqlString
                |
                |# Resolved query plan:
                |${df.queryExecution.analyzed.treeString}
              """.stripMargin, cause)
         }
       }
     
    +  test("checkSQL") {
    --- End diff --
    
    Oh, I see. I'll fix like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    For example, if we change generation of alias names, most of test cases will fail. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Also personally I find it difficult to look at the result without looking at the original SQL. Can we update the result file to show the following format?
    
    ```
    original query
    ---------------------
    generated query
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62446/consoleFull)** for PR 14235 at commit [`e2a7ac4`](https://github.com/apache/spark/commit/e2a7ac445e7f0288d384996dc0e36195e5849c5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094871
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,12 +17,21 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A testsuite for LogicalPlan-to-SQL conversion.
    --- End diff --
    
    Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62486/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    The followings are updated.
    - Adds a heading line in golden SQL files.
    - Adds more documents and replace `HiveQL` with `SQL`. The remaining `hive` are only the one in test input SQL.
    - Adds a testcase for `checkSQL`: SQL Parsing, Test File Existence, SQL Generation.
      - Only execution result checking is omitted, because the query structure is already the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    LGTM.
    
    One thing is that I feel most of the times the SQL comparison assertion may fail due to reasonable internal changes that somehow affect SQL generation in no harmful ways, and can be fixed by simply regenerating the golden files. That said, shall we add instructions about how to regenerate the golden files when the the assertion fails?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    :) Sorry, but I don't think so.
    * At every level, we need to prove correctness. The tolerance you want is the result of unpredictable removal of Optimizer.
    * Also, this is `LogicalPlanToSQLSuite` . SQL statement comparison is not a good way, but the only correct way to this module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    I can understand the advantage of comparing SQL statement strings, but the cost is higher than the benefit especially for a small group of key developers who are maintaining the Spark code everyday. 
    
    In some commercial RDBMS, they have a whole team for developing and maintaining SQL generation. However, I am not sure if the Spark community can afford it. You know, SQL generation is only used for native view support so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62423/consoleFull)** for PR 14235 at commit [`9f20a63`](https://github.com/apache/spark/commit/9f20a63d9be027bcb25e1eeecb28658976cdae98).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Since the purpose of the function here is to generate SQL, I'd say it's important to actually show the generated SQL in reviews.
    
    That said, if we can also get something even more automated (e.g. asserting that the optimized plan equals) in addition to this, then it'd be even more robust!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    For the self unittest, sure. I did it manually, but it would be better if we had it, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095137
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkSQL(sqlString: String, answerFile: String = null): Unit = {
    +    val df = sql(sqlString)
     
         val convertedSQL = try new SQLBuilder(df).toSQL catch {
           case NonFatal(e) =>
             fail(
               s"""Cannot convert the following HiveQL query plan back to SQL query string:
                  |
                  |# Original HiveQL query string:
    --- End diff --
    
    Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    I'm not sure what cost exactly do you mean.
    It's a flag, isn't it? Also, if someone change SQLBuilder, it should be tested by the generated SQLs, not the query execution result.
    For example, BROADCAST hints will be tested here as comments after merging this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71100950
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -101,95 +149,125 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
                |# Converted SQL query string:
                |$convertedSQL
                |
    -           |# Original HiveQL query string:
    -           |$hiveQl
    +           |# Original SQL query string:
    +           |$sqlString
                |
                |# Resolved query plan:
                |${df.queryExecution.analyzed.treeString}
              """.stripMargin, cause)
         }
       }
     
    +  test("checkSQL") {
    --- End diff --
    
    Actually maybe these should be 4 test cases instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094825
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,12 +17,21 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A testsuite for LogicalPlan-to-SQL conversion.
    --- End diff --
    
    also add something like
    
    "Each query has a golden generated SQL file in test/resources/sqlgen. The test suite also has built-in functionality to automatically generate these golden files. To do so, set the saveQuery flag to true and
    run the test suite. After generating the golden files, make sure to set the flag to false."
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71092240
  
    --- Diff: sql/hive/src/test/resources/sqlgen/agg1.sql ---
    @@ -0,0 +1,3 @@
    +SELECT COUNT(value) FROM parquet_t1 GROUP BY key HAVING MAX(key) > 0
    +--------------------------------------------------------------------------------
    +SELECT `gen_attr` AS `count(value)` FROM (SELECT `gen_attr` FROM (SELECT count(`gen_attr`) AS `gen_attr`, max(`gen_attr`) AS `gen_attr` FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS gen_subquery_0 GROUP BY `gen_attr` HAVING (`gen_attr` > CAST(0 AS BIGINT))) AS gen_subquery_1) AS gen_subquery_2
    --- End diff --
    
    can you add a new line to the end? this little red thing on github is annoying


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62469/consoleFull)** for PR 14235 at commit [`efaa4d0`](https://github.com/apache/spark/commit/efaa4d0d55373280e19ed38b7e192545e4a3a6af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095641
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    --- End diff --
    
    Yes. It seems that only `gen_subquery_XXX` is stable and `gen_attr_YYY` is not, currently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Oh, sure. I'll update the classdoc and change the file format.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    @dongjoon-hyun can you take a look at @gatorsmile's suggestion to check optimized logical plan and see if it is feasible?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    hm the problem with this approach is that we'd need to spend a lot of time to update test cases whenever we change sql generation slightly. I think in order to do this, we should put the generated sql in  files and then have a way to regenerate all the sql queries in bulk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71096013
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkSQL(sqlString: String, answerFile: String = null): Unit = {
    --- End diff --
    
    That's good.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62450/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62444/consoleFull)** for PR 14235 at commit [`6df0c18`](https://github.com/apache/spark/commit/6df0c187245f1329de2732a844d6edede9440fb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094937
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkSQL(sqlString: String, answerFile: String = null): Unit = {
    --- End diff --
    
    When is answerFile null? You should document this. Also why does it default to null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095072
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    --- End diff --
    
    are we not using stable identifiers for those? How come we have this normalization process?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095681
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    --- End diff --
    
    Actually, as you guess in the function name, it compares the `Structure`.
    We remove the id here, but the alias is remaining. So, the output column names are checked correctly for aliases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62423/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71100846
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    --- End diff --
    
    Can we make gen_attr_YYY stable? (can submit a separate pr)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    When the optimized plans are different, we need to check whether the optimized plan of the generated SQL perform worse than the original one. If so, we need to improve the SQL generation logics. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71092270
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,7 +85,34 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkHiveQl(hiveQl: String, answerFile: String = null): Unit = {
    --- End diff --
    
    while you at it, rename checkHiveQl to something else (e.g. checkSQL).
    
    Also why should answerFile be null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Comparing the SQL statement strings is horrible to me. 
    
    I have a different approach to verify the **correctness**. How about compare the optimized plans, which can tolerate more slight changes that do not affect the correctness? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094899
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    --- End diff --
    
    Sorry, what do you mean? The omitting number from gen_attr_123?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71100930
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    --- End diff --
    
    Sure. I think so. It would be great issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094995
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkSQL(sqlString: String, answerFile: String = null): Unit = {
    +    val df = sql(sqlString)
     
         val convertedSQL = try new SQLBuilder(df).toSQL catch {
           case NonFatal(e) =>
             fail(
               s"""Cannot convert the following HiveQL query plan back to SQL query string:
                  |
                  |# Original HiveQL query string:
    --- End diff --
    
    Hive -> SQL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094958
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -102,7 +140,7 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
                |$convertedSQL
                |
                |# Original HiveQL query string:
    --- End diff --
    
    HiveQL -> SQL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Oh, thank you for merging, @rxin ! Also, thank you for review, @gatorsmile and @liancheng .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71216783
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
    @@ -45,7 +46,12 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends Logging {
       def this(df: Dataset[_]) = this(df.queryExecution.analyzed)
     
       private val nextSubqueryId = new AtomicLong(0)
    +  private val nextGenAttrId = new AtomicLong(0)
       private def newSubqueryName(): String = s"gen_subquery_${nextSubqueryId.getAndIncrement()}"
    +  private val exprIdMap: Map[Long, Long] = Map.empty[Long, Long]
    +  private def normalizedName(n: NamedExpression): String = synchronized {
    +    "gen_attr_" + exprIdMap.getOrElseUpdate(n.exprId.id, nextGenAttrId.getAndIncrement())
    +  }
    --- End diff --
    
    Now, the generated attributes also have stable identifiers like generated subqueries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    It's definitely a solution. In fact, I tried that first before making this PR, of course, not completely, but as a feasibility tests. It has its own pros and cons, too.
    
    For example, if we add new optimizers like `OptimizeIn` by removing duplications, we should change here too. (If there is a case here.) This is the same situation because we didn't check the any plans or queries until now. We only check the parseability and the result.
    
    I'm sure that all attempts beyond the existing suite have the same side-effects.
    
    This PR aims to ensure the correctness before making the changes on SQLBuilder and Hint, or other refactoring. It has its meaning since the current testsuite is fragile as we know.
    
    You may want to change the level of importance or test coverage for any convenience. I understand the intention of @gatorsmile , too. 
    
    But, IMHO, it's difficult to refactor some module without a proper testsuite. At the worst case, your can remove the second parameters easily when we reach more stable level. It's similar with changing `test()` into `ignore()`.
    
    Lastly, let me suggest like this. My opinion is having this now for further refactoring. After all SQL functions moves into each LogicalPlans unittests, we can downgrade this to compare optimized plan in most cases (not all cases).
    
    How do you think about this way, @rxin and @gatorsmile ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71111090
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,15 +17,32 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, NoSuchFileException, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
    +import org.apache.spark.sql.catalyst.parser.ParseException
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A test suite for LogicalPlan-to-SQL conversion.
    + *
    + * Each query has a golden generated SQL file in test/resources/sqlgen. The test suite also has
    + * built-in functionality to automatically generate these golden files.
    + *
    + * To re-generate golden files, run:
    + *    SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite"
    + */
     class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
       import testImplicits._
     
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = System.getenv("SPARK_GENERATE_GOLDEN_FILES") != null
    --- End diff --
    
    also rename saveQuery to regenerateGoldenFiles


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62446/consoleFull)** for PR 14235 at commit [`e2a7ac4`](https://github.com/apache/spark/commit/e2a7ac445e7f0288d384996dc0e36195e5849c5a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71257655
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,15 +17,33 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, NoSuchFileException, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
    +import org.apache.spark.sql.catalyst.parser.ParseException
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A test suite for LogicalPlan-to-SQL conversion.
    + *
    + * Each query has a golden generated SQL file in test/resources/sqlgen. The test suite also has
    + * built-in functionality to automatically generate these golden files.
    + *
    + * To re-generate golden files, run:
    + *    SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite"
    + */
     class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
       import testImplicits._
     
    +  // Used for generating new query answer files by saving
    +  private val regenerateGoldenFiles =
    +    Option(System.getenv("SPARK_GENERATE_GOLDEN_FILES")).contains("1")
    --- End diff --
    
    I fixed it here https://github.com/apache/spark/commit/c4524f5193e1b3ce1c56c5aed126f4121ce26d23


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Regenerating SQL statements can simplify the work, but the reviewers have to be careful when reviewing the statement changes. Those statements could be very complex and hard to read.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62487/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62450/consoleFull)** for PR 14235 at commit [`a9a1b00`](https://github.com/apache/spark/commit/a9a1b00411fa6175dd1e628a3485a4313999b636).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71096037
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    just use a dummy file (e.g. sqlgen/README.md) to get the path using getResource then.
    
    It's probably good to have README.md in sqlgen to explain what that folder is anyway.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    @dongjoon-hyun what problem did you run into when you tried comparing optimized plans?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71101535
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    I'll try again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    First, we will make more HINTS. I made a general HINT syntax for that. We are opening the door together.
    
    Second, when you add your new testcase here, the additional cost is just **one second** to generate the file.
    
    Third, when you develops your PR, this will save much time. I'm already using this to save my time. At this time, we need to verify correctness at every level. If you use optimized result or the execution result, that is a naive test. You know that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    @dongjoon-hyun please also implement unit tests for the checking functionality. In particular, we want to make sure the test actually fails if the generated SQL does not match the golden file.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Hmm, HiveCompatibilitySuite has some dependency on `gen_attr_`. I reverted the last attempt.
    I will try as another PR as planed before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71103029
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    While making that, I feel this becomes much convenient. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Oh, thank you for fast review!
    Yep. I will update like that.
    The purpose of this PR is having stronger `LogicalPlanToSQLSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Hi, @rxin and @liancheng .
    I will update this PR one more time. Please wait a moment.
    I can use stable identifiers for `gen_attr`, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62487/consoleFull)** for PR 14235 at commit [`ee5b747`](https://github.com/apache/spark/commit/ee5b747526669956d5de58e7a58d5859ccfb2a2c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62423/consoleFull)** for PR 14235 at commit [`9f20a63`](https://github.com/apache/spark/commit/9f20a63d9be027bcb25e1eeecb28658976cdae98).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Now, the remaining issue is using `getResource` to save the golden files. I left a comment about that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Thank you for guiding me in this PR, @rxin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62487/consoleFull)** for PR 14235 at commit [`ee5b747`](https://github.com/apache/spark/commit/ee5b747526669956d5de58e7a58d5859ccfb2a2c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62426/consoleFull)** for PR 14235 at commit [`4165581`](https://github.com/apache/spark/commit/4165581c6b13e6dc93acea0e35a3cc1550ccfb57).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62426/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71095089
  
    --- Diff: sql/hive/src/test/resources/sqlgen/agg1.sql ---
    @@ -0,0 +1,3 @@
    +SELECT COUNT(value) FROM parquet_t1 GROUP BY key HAVING MAX(key) > 0
    --- End diff --
    
    It looks much better. Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62448/consoleFull)** for PR 14235 at commit [`38f52ce`](https://github.com/apache/spark/commit/38f52cea6fbd53cf66429d396663708afcb1f193).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Thank you, @rxin . By the way, the following test occurs two times sequentially.
    ```
    HiveSparkSubmitSuite.SPARK-8020: set sql conf in spark conf *** FAILED *** (5 minutes, 0 seconds)
    ```
    I looked into the cases, but still have no idea about that. I wish the Jenkins pass at this time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71101173
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +88,58 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val header = "-- This file is automatically generated by LogicalPlanToSQLSuite."
    +        val answerText = s"$header\n${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    --- End diff --
    
    No problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62420/consoleFull)** for PR 14235 at commit [`a3bb306`](https://github.com/apache/spark/commit/a3bb306af576834d93394d4ac636c5b52b6d745a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71102174
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    --- End diff --
    
    Also instead of using a variable as a flag, maybe we can use an environmental variable, and then document in the beginning
    
    ```
    * To re-generate golden files, run:  SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-Only * LogicalPlanToSQLSuite"
    ```
    
    something like this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Checking the generated SQL is a good thing to do - we just need to make sure it is maintainable (i.e. have tooling to automatically regenerate results).
    
    Once we are done with this, we should also do similar things for the parser -- I think the unit tests for parsers don't currently check for the plan, but only checks whether the parser can parse it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71257848
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,15 +17,33 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, NoSuchFileException, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
    +import org.apache.spark.sql.catalyst.parser.ParseException
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A test suite for LogicalPlan-to-SQL conversion.
    + *
    + * Each query has a golden generated SQL file in test/resources/sqlgen. The test suite also has
    + * built-in functionality to automatically generate these golden files.
    + *
    + * To re-generate golden files, run:
    + *    SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite"
    + */
     class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
       import testImplicits._
     
    +  // Used for generating new query answer files by saving
    +  private val regenerateGoldenFiles =
    +    Option(System.getenv("SPARK_GENERATE_GOLDEN_FILES")).contains("1")
    --- End diff --
    
    Yep. I have nothing to say. My bad. Sorry about this. :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62493/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62469/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Thank you for review, @liancheng . Sure. Currently, it is only documented in class doc. I think you are suggesting to have that in some HTML or Wiki(Confluence). Did I understand your advice correctly?
    ```scala
    /**
     * A test suite for LogicalPlan-to-SQL conversion.
     *
     * Each query has a golden generated SQL file in test/resources/sqlgen. The test suite also has
     * built-in functionality to automatically generate these golden files.
     *
     * To re-generate golden files, run:
     *    SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "hive/test-only *LogicalPlanToSQLSuite"
     */
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    **[Test build #62426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62426/consoleFull)** for PR 14235 at commit [`4165581`](https://github.com/apache/spark/commit/4165581c6b13e6dc93acea0e35a3cc1550ccfb57).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQLSuite t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    First of all, you can update the whole query set by one flag. So, maintainability is no more difficult issue.
    
    I made this PR because `SQL generation` is currently fragile as [you said yesterday](https://github.com/apache/spark/pull/14132#issuecomment-233107976).
    
    We need to prevent unintentional and accidental changes on that before both Hint or [SPARK-16576](https://issues.apache.org/jira/browse/SPARK-16576).
    
    IMO, this is a **correctness** issue we should resolve. I hope this PR protects Spark from **me**. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094999
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,22 +85,51 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    -    val df = sql(hiveQl)
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    +
    +  /**
    +   * Compare the generated SQL with the expected answer string.
    +   * Note that there exists a normalization for both arguments for the convenience.
    +   * - Remove the id from the generated attributes, e.g., `gen_attr_1` -> `gen_attr`.
    +   */
    +  private def checkSQLStructure(originalSQL: String, convertedSQL: String, answerFile: String) = {
    +    val normalizedGenSQL = convertedSQL.replaceAll("`gen_attr_\\d+`", "`gen_attr`")
    +    if (answerFile != null) {
    +      val path = s"src/test/resources/sqlgen/$answerFile.sql"
    +      val separator = "-" * 80
    +      if (saveQuery) {
    +        val answerText = s"${originalSQL.trim()}\n${separator}\n$normalizedGenSQL\n"
    +        Files.write(Paths.get(path), answerText.getBytes(StandardCharsets.UTF_8))
    +      } else {
    +        val answerText = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)
    +        val sqls = answerText.split(separator)
    +        if (sqls.length == 2) {
    +          val normalizedExpectSQL = sqls(1).trim()
    +          assert(normalizedGenSQL == normalizedExpectSQL)
    +        }
    +      }
    +    }
    +  }
    +
    +  private def checkSQL(sqlString: String, answerFile: String = null): Unit = {
    +    val df = sql(sqlString)
     
         val convertedSQL = try new SQLBuilder(df).toSQL catch {
           case NonFatal(e) =>
             fail(
               s"""Cannot convert the following HiveQL query plan back to SQL query string:
                  |
                  |# Original HiveQL query string:
    --- End diff --
    
    Can you find other instances of HiveQL and replace it with SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71076593
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -76,7 +79,29 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils {
         }
       }
     
    -  private def checkHiveQl(hiveQl: String): Unit = {
    +  // Used for generating new query answer files by saving
    +  private val saveQuery = false
    --- End diff --
    
    Hi, @gatorsmile .
    This is my answer to that. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Looks pretty good now. Just couple minor comments.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14235


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14235: [SPARK-16590][SQL][TEST] Improve LogicalPlanToSQL...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14235#discussion_r71094791
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala ---
    @@ -17,12 +17,21 @@
     
     package org.apache.spark.sql.catalyst
     
    +import java.nio.charset.StandardCharsets
    +import java.nio.file.{Files, Paths}
    +
     import scala.util.control.NonFatal
     
     import org.apache.spark.sql.Column
     import org.apache.spark.sql.functions._
     import org.apache.spark.sql.test.SQLTestUtils
     
    +/**
    + * A testsuite for LogicalPlan-to-SQL conversion.
    --- End diff --
    
    testsuite -> test suite



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Jenkins is restarted, but the current last commit is efaa4d0 having the passed Jenkins test. Could you review and merge this first if possible, @rxin ?
    ```
    Test build #62469 has finished for PR 14235 at commit efaa4d0.
    
    This patch passes all tests.
    This patch merges cleanly.
    This patch adds no public classes.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14235: [SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to chec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14235
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62448/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org