You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by jiangxb1987 <gi...@git.apache.org> on 2017/01/17 09:50:17 UTC

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/16613

    [SPARK-19024][SQL] Implement new approach to write a permanent view

    ## What changes were proposed in this pull request?
    
    On CREATE/ALTER a view, it's no longer needed to generate a SQL text string from the LogicalPlan, instead we store the SQL query text\u3001the output column names of the query plan, and current database to CatalogTable. Permanent views created by this approach can be resolved by current view resolution approach.
    
    The main advantage includes:
    1. If you update an underlying view, the current view also gets updated;
    2. That gives us a change to get ride of SQL generation for operators.
    
    Major changes of this PR:
    1. Generate the view-specific properties(e.g. view default database, view query output column names) during permanent view creation and store them as properties in the CatalogTable;
    2. Update the commands `CreateViewCommand` and `AlterViewAsCommand`, get rid of SQL generation from them.
    
    ## How was this patch tested?
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark view-write-path

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16613
    
----
commit 917ca040002fcb57e1947b2f619ad3e2b2fd32c2
Author: jiangxingbo <ji...@gmail.com>
Date:   2017-01-17T09:41:25Z

    implement view write path for the new approach.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96560699
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -275,21 +276,93 @@ case class AlterViewAsCommand(
           throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
         }
     
    -    val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      session.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    -    }
    +    val newProperties = generateViewProperties(viewMeta.properties, session, originalText)
     
         val updatedViewMeta = viewMeta.copy(
           schema = analyzedPlan.schema,
    +      properties = newProperties,
           viewOriginalText = Some(originalText),
    -      viewText = Some(viewSQL))
    +      viewText = Some(originalText))
     
         session.sessionState.catalog.alterTable(updatedViewMeta)
       }
     }
    +
    +object ViewHelper {
    +
    +  import CatalogTable._
    +
    +  /**
    +   * Generate the view default database in `properties`.
    +   */
    +  def generateViewDefaultDatabase(databaseName: String): Map[String, String] = {
    +    Map(VIEW_DEFAULT_DATABASE -> databaseName)
    +  }
    +
    +  /**
    +   * Generate the view query output column names in `properties`.
    +   */
    +  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = {
    +    val props = new mutable.HashMap[String, String]
    +    if (columns.nonEmpty) {
    +      props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
    +      columns.zipWithIndex.foreach { case (colName, index) =>
    +        props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
    +      }
    +    }
    +    props.toMap
    +  }
    +
    +  /**
    +   * Remove the view query output column names in `properties`.
    +   */
    +  def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = {
    +    // We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable,
    +    // while `CatalogTable` should be serializable.
    +    properties.filterNot { case (key, _) =>
    +      key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
    +    }
    +  }
    +
    +  /**
    +   * Generate the view properties in CatalogTable, including:
    +   * 1. view default database that is used to provide the default database name on view resolution.
    +   * 2. the output column names of the query that creates a view, this is used to map the output of
    +   *    the view child to the view output during view resolution.
    +   *
    +   * @param properties the `properties` in CatalogTable.
    +   * @param session the spark session.
    +   * @param viewText the query string used to create the child plan on view resolution.
    +   * @return new view properties including view default database and query column names properties.
    +   */
    +  def generateViewProperties(
    +      properties: Map[String, String],
    +      session: SparkSession,
    +      viewText: String): Map[String, String] = {
    +    // Try to analyze the viewText, throw an AnalysisException if the query is invalid.
    --- End diff --
    
    do we need to do this? the passed in `query` is already the parsed and analyzed plan of viewText, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    is there a feature flag that is used to determine if we use this new approach? I feel it will be good to have an internal feature flag to determine the code path. So, if there is something wrong that is hard to fix quickly before the release, we can still switch back to the old code path. Then, in the next release, we can remove the feature flag. What do you think?
    
    Also, @jiangxb1987 can you take a look at the SQLViewSuite and see if we have enough test coverage?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71507/testReport)** for PR 16613 at commit [`9d582a4`](https://github.com/apache/spark/commit/9d582a453df92b2cc18df38acaaab0f621fac9c5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71587/testReport)** for PR 16613 at commit [`e2ccdd5`](https://github.com/apache/spark/commit/e2ccdd5689274e44289017822a2bf81de32dbd37).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71501/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71589/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96587314
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -173,7 +165,8 @@ case class CreateViewCommand(
           }
         } else {
           // Create the view if it doesn't exist.
    -      catalog.createTable(prepareTable(sparkSession, aliasedPlan), ignoreIfExists = false)
    +      catalog.createTable(prepareTable(sparkSession, analyzedPlan),
    +        ignoreIfExists = false)
    --- End diff --
    
    nit: put them in one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96580856
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -275,21 +286,80 @@ case class AlterViewAsCommand(
           throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
         }
     
    -    val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      session.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    -    }
    +    val newProperties = generateViewProperties(viewMeta.properties, session, analyzedPlan)
     
         val updatedViewMeta = viewMeta.copy(
           schema = analyzedPlan.schema,
    +      properties = newProperties,
           viewOriginalText = Some(originalText),
    -      viewText = Some(viewSQL))
    +      viewText = Some(originalText))
     
         session.sessionState.catalog.alterTable(updatedViewMeta)
       }
     }
    +
    +object ViewHelper {
    +
    +  import CatalogTable._
    +
    +  /**
    +   * Generate the view default database in `properties`.
    +   */
    +  def generateViewDefaultDatabase(databaseName: String): Map[String, String] = {
    +    Map(VIEW_DEFAULT_DATABASE -> databaseName)
    +  }
    +
    +  /**
    +   * Generate the view query output column names in `properties`.
    +   */
    +  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = {
    +    val props = new mutable.HashMap[String, String]
    +    if (columns.nonEmpty) {
    +      props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
    +      columns.zipWithIndex.foreach { case (colName, index) =>
    +        props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
    +      }
    +    }
    +    props.toMap
    +  }
    +
    +  /**
    +   * Remove the view query output column names in `properties`.
    +   */
    +  def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = {
    +    // We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable,
    +    // while `CatalogTable` should be serializable.
    +    properties.filterNot { case (key, _) =>
    +      key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
    +    }
    +  }
    +
    +  /**
    +   * Generate the view properties in CatalogTable, including:
    +   * 1. view default database that is used to provide the default database name on view resolution.
    +   * 2. the output column names of the query that creates a view, this is used to map the output of
    +   *    the view child to the view output during view resolution.
    +   *
    +   * @param properties the `properties` in CatalogTable.
    +   * @param session the spark session.
    +   * @param analyzedPlan the analyzed logical plan that represents the child of a view.
    +   * @return new view properties including view default database and query column names properties.
    +   */
    +  def generateViewProperties(
    --- End diff --
    
    yea, will update that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96561095
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
    @@ -370,28 +370,35 @@ class HiveDDLSuite
           spark.range(10).write.saveAsTable(tabName)
           val viewName = "view1"
           withView(viewName) {
    +        def checkProperties(
    +            properties: Map[String, String],
    +            expected: Map[String, String]): Boolean = {
    +          properties.filterNot { case (key, value) =>
    +            Seq("transient_lastDdlTime", CatalogTable.VIEW_DEFAULT_DATABASE).contains(key) ||
    +              key.startsWith(CatalogTable.VIEW_QUERY_OUTPUT_PREFIX)
    --- End diff --
    
    shall we just filter out table properties that start with `CatalogTable.VIEW_QUERY_OUTPUT_PREFIX`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96563682
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -207,29 +209,26 @@ case class CreateViewCommand(
       }
     
       /**
    -   * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize
    -   * SQL based on the analyzed plan, and also creates the proper schema for the view.
    +   * Returns a [[CatalogTable]] that can be used to save in the catalog. Generate the view-specific
    +   * properties(e.g. view default database, view query output column names) and store them as
    +   * properties in the CatalogTable, and also creates the proper schema for the view.
        */
    -  private def prepareTable(sparkSession: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = {
    -    val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
    -
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    +  private def prepareTable(session: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = {
    +    if (originalText.isEmpty) {
    +      throw new AnalysisException(
    +        "It is not allowed to create a persisted view from the Dataset API")
         }
     
    +    val newProperties = generateViewProperties(properties, session, originalText.get)
    +
         CatalogTable(
           identifier = name,
           tableType = CatalogTableType.VIEW,
           storage = CatalogStorageFormat.empty,
           schema = aliasedPlan.schema,
    -      properties = properties,
    +      properties = newProperties,
           viewOriginalText = originalText,
    -      viewText = Some(viewSQL),
    +      viewText = originalText,
    --- End diff --
    
    yea, in a separated PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71589/testReport)** for PR 16613 at commit [`7c5b6af`](https://github.com/apache/spark/commit/7c5b6afe97676c42b62a8d2a92e98d120f81c96b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71501/testReport)** for PR 16613 at commit [`917ca04`](https://github.com/apache/spark/commit/917ca040002fcb57e1947b2f619ad3e2b2fd32c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96561447
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -275,21 +276,93 @@ case class AlterViewAsCommand(
           throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
         }
     
    -    val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      session.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    -    }
    +    val newProperties = generateViewProperties(viewMeta.properties, session, originalText)
     
         val updatedViewMeta = viewMeta.copy(
           schema = analyzedPlan.schema,
    +      properties = newProperties,
           viewOriginalText = Some(originalText),
    -      viewText = Some(viewSQL))
    +      viewText = Some(originalText))
     
         session.sessionState.catalog.alterTable(updatedViewMeta)
       }
     }
    +
    +object ViewHelper {
    +
    +  import CatalogTable._
    +
    +  /**
    +   * Generate the view default database in `properties`.
    +   */
    +  def generateViewDefaultDatabase(databaseName: String): Map[String, String] = {
    +    Map(VIEW_DEFAULT_DATABASE -> databaseName)
    +  }
    +
    +  /**
    +   * Generate the view query output column names in `properties`.
    +   */
    +  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = {
    +    val props = new mutable.HashMap[String, String]
    +    if (columns.nonEmpty) {
    +      props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
    +      columns.zipWithIndex.foreach { case (colName, index) =>
    +        props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
    +      }
    +    }
    +    props.toMap
    +  }
    +
    +  /**
    +   * Remove the view query output column names in `properties`.
    +   */
    +  def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = {
    +    // We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable,
    +    // while `CatalogTable` should be serializable.
    +    properties.filterNot { case (key, _) =>
    +      key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
    +    }
    +  }
    +
    +  /**
    +   * Generate the view properties in CatalogTable, including:
    +   * 1. view default database that is used to provide the default database name on view resolution.
    +   * 2. the output column names of the query that creates a view, this is used to map the output of
    +   *    the view child to the view output during view resolution.
    +   *
    +   * @param properties the `properties` in CatalogTable.
    +   * @param session the spark session.
    +   * @param viewText the query string used to create the child plan on view resolution.
    +   * @return new view properties including view default database and query column names properties.
    +   */
    +  def generateViewProperties(
    +      properties: Map[String, String],
    +      session: SparkSession,
    +      viewText: String): Map[String, String] = {
    +    // Try to analyze the viewText, throw an AnalysisException if the query is invalid.
    --- End diff --
    
    Good catch! The `query` is not analyzed, perhaps we should use the `analyzedPlan`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96580350
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -275,21 +286,80 @@ case class AlterViewAsCommand(
           throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
         }
     
    -    val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      session.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    -    }
    +    val newProperties = generateViewProperties(viewMeta.properties, session, analyzedPlan)
     
         val updatedViewMeta = viewMeta.copy(
           schema = analyzedPlan.schema,
    +      properties = newProperties,
           viewOriginalText = Some(originalText),
    -      viewText = Some(viewSQL))
    +      viewText = Some(originalText))
     
         session.sessionState.catalog.alterTable(updatedViewMeta)
       }
     }
    +
    +object ViewHelper {
    +
    +  import CatalogTable._
    +
    +  /**
    +   * Generate the view default database in `properties`.
    +   */
    +  def generateViewDefaultDatabase(databaseName: String): Map[String, String] = {
    +    Map(VIEW_DEFAULT_DATABASE -> databaseName)
    +  }
    +
    +  /**
    +   * Generate the view query output column names in `properties`.
    +   */
    +  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = {
    +    val props = new mutable.HashMap[String, String]
    +    if (columns.nonEmpty) {
    +      props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
    +      columns.zipWithIndex.foreach { case (colName, index) =>
    +        props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
    +      }
    +    }
    +    props.toMap
    +  }
    +
    +  /**
    +   * Remove the view query output column names in `properties`.
    +   */
    +  def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = {
    +    // We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable,
    +    // while `CatalogTable` should be serializable.
    +    properties.filterNot { case (key, _) =>
    +      key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
    +    }
    +  }
    +
    +  /**
    +   * Generate the view properties in CatalogTable, including:
    +   * 1. view default database that is used to provide the default database name on view resolution.
    +   * 2. the output column names of the query that creates a view, this is used to map the output of
    +   *    the view child to the view output during view resolution.
    +   *
    +   * @param properties the `properties` in CatalogTable.
    +   * @param session the spark session.
    +   * @param analyzedPlan the analyzed logical plan that represents the child of a view.
    +   * @return new view properties including view default database and query column names properties.
    +   */
    +  def generateViewProperties(
    --- End diff --
    
    looks like all other methods in this class can be `private`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Thank you @cloud-fan !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71587/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71589/testReport)** for PR 16613 at commit [`7c5b6af`](https://github.com/apache/spark/commit/7c5b6afe97676c42b62a8d2a92e98d120f81c96b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71579/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96560936
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -207,29 +209,26 @@ case class CreateViewCommand(
       }
     
       /**
    -   * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize
    -   * SQL based on the analyzed plan, and also creates the proper schema for the view.
    +   * Returns a [[CatalogTable]] that can be used to save in the catalog. Generate the view-specific
    +   * properties(e.g. view default database, view query output column names) and store them as
    +   * properties in the CatalogTable, and also creates the proper schema for the view.
        */
    -  private def prepareTable(sparkSession: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = {
    -    val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
    -
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    +  private def prepareTable(session: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = {
    +    if (originalText.isEmpty) {
    +      throw new AnalysisException(
    +        "It is not allowed to create a persisted view from the Dataset API")
         }
     
    +    val newProperties = generateViewProperties(properties, session, originalText.get)
    +
         CatalogTable(
           identifier = name,
           tableType = CatalogTableType.VIEW,
           storage = CatalogStorageFormat.empty,
           schema = aliasedPlan.schema,
    -      properties = properties,
    +      properties = newProperties,
           viewOriginalText = originalText,
    -      viewText = Some(viewSQL),
    +      viewText = originalText,
    --- End diff --
    
    something we can clean up: Hive will expand the view text, so it needs 2 fields: originalText and viewText. Since we don't expand the view text, but only add table properties, I think we only need a single field `viewText` in CatalogTable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16613


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    nvm. After second thought, the feature flag does not really buy us anything. We just store the original view definition and the column mapping in the metastore. So, I think it is fine to just do the switch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96587519
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -207,29 +200,45 @@ case class CreateViewCommand(
       }
     
       /**
    -   * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize
    -   * SQL based on the analyzed plan, and also creates the proper schema for the view.
    +   * If `userSpecifiedColumns` is defined, alias the analyzed plan to the user specified columns,
    +   * else return the analyzed plan directly.
        */
    -  private def prepareTable(sparkSession: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = {
    -    val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
    -
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    +  private def aliasPlan(session: SparkSession, analyzedPlan: LogicalPlan): LogicalPlan = {
    +    if (userSpecifiedColumns.isEmpty) {
    +      analyzedPlan
    +    } else {
    +      val projectList = analyzedPlan.output.zip(userSpecifiedColumns).map {
    +        case (attr, (colName, None)) => Alias(attr, colName)()
    +        case (attr, (colName, Some(colComment))) =>
    +          val meta = new MetadataBuilder().putString("comment", colComment).build()
    +          Alias(attr, colName)(explicitMetadata = Some(meta))
    +      }
    +      session.sessionState.executePlan(Project(projectList, analyzedPlan)).analyzed
         }
    +  }
    +
    +  /**
    +   * Returns a [[CatalogTable]] that can be used to save in the catalog. Generate the view-specific
    +   * properties(e.g. view default database, view query output column names) and store them as
    +   * properties in the CatalogTable, and also creates the proper schema for the view.
    +   */
    +  private def prepareTable(session: SparkSession, analyzedPlan: LogicalPlan): CatalogTable = {
    +    if (originalText.isEmpty) {
    +      throw new AnalysisException(
    +        "It is not allowed to create a persisted view from the Dataset API")
    +    }
    +
    +    val aliasedPlan = aliasPlan(session, analyzedPlan)
    +    val newProperties = generateViewProperties(properties, session, analyzedPlan)
     
         CatalogTable(
           identifier = name,
           tableType = CatalogTableType.VIEW,
           storage = CatalogStorageFormat.empty,
           schema = aliasedPlan.schema,
    --- End diff --
    
    nit: inline the `aliasedPlan` here, i.e. `schema = aliasPlan(session, analyzedPlan).schema`. Then it's clearer that we only alias the plan to get the schema


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71579/testReport)** for PR 16613 at commit [`2d49ef2`](https://github.com/apache/spark/commit/2d49ef26936448dd70768562c4ef429542f56e4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71507/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71501/testReport)** for PR 16613 at commit [`917ca04`](https://github.com/apache/spark/commit/917ca040002fcb57e1947b2f619ad3e2b2fd32c2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96563876
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -275,21 +276,93 @@ case class AlterViewAsCommand(
           throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
         }
     
    -    val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      session.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    -    }
    +    val newProperties = generateViewProperties(viewMeta.properties, session, originalText)
     
         val updatedViewMeta = viewMeta.copy(
           schema = analyzedPlan.schema,
    +      properties = newProperties,
           viewOriginalText = Some(originalText),
    -      viewText = Some(viewSQL))
    +      viewText = Some(originalText))
     
         session.sessionState.catalog.alterTable(updatedViewMeta)
       }
     }
    +
    +object ViewHelper {
    +
    +  import CatalogTable._
    +
    +  /**
    +   * Generate the view default database in `properties`.
    +   */
    +  def generateViewDefaultDatabase(databaseName: String): Map[String, String] = {
    +    Map(VIEW_DEFAULT_DATABASE -> databaseName)
    +  }
    +
    +  /**
    +   * Generate the view query output column names in `properties`.
    +   */
    +  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = {
    +    val props = new mutable.HashMap[String, String]
    +    if (columns.nonEmpty) {
    +      props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
    +      columns.zipWithIndex.foreach { case (colName, index) =>
    +        props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
    +      }
    +    }
    +    props.toMap
    +  }
    +
    +  /**
    +   * Remove the view query output column names in `properties`.
    +   */
    +  def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = {
    +    // We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable,
    +    // while `CatalogTable` should be serializable.
    +    properties.filterNot { case (key, _) =>
    +      key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
    +    }
    +  }
    +
    +  /**
    +   * Generate the view properties in CatalogTable, including:
    +   * 1. view default database that is used to provide the default database name on view resolution.
    +   * 2. the output column names of the query that creates a view, this is used to map the output of
    +   *    the view child to the view output during view resolution.
    +   *
    +   * @param properties the `properties` in CatalogTable.
    +   * @param session the spark session.
    +   * @param viewText the query string used to create the child plan on view resolution.
    +   * @return new view properties including view default database and query column names properties.
    +   */
    +  def generateViewProperties(
    +      properties: Map[String, String],
    +      session: SparkSession,
    +      viewText: String): Map[String, String] = {
    +    // Try to analyze the viewText, throw an AnalysisException if the query is invalid.
    --- End diff --
    
    yea, we should use analyzedPlan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71587/testReport)** for PR 16613 at commit [`e2ccdd5`](https://github.com/apache/spark/commit/e2ccdd5689274e44289017822a2bf81de32dbd37).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    cc @cloud-fan @yhuai @hvanhovell @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96587726
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
    @@ -370,28 +370,35 @@ class HiveDDLSuite
           spark.range(10).write.saveAsTable(tabName)
           val viewName = "view1"
           withView(viewName) {
    +        def checkProperties(
    +            properties: Map[String, String],
    --- End diff --
    
    nit: we can make this method better
    ```
    def checkProperties(expected: Map[String, String]): Boolean = {
      val properties = catalog.getTableMetadata(TableIdentifier(viewName)).properties
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96562319
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -207,29 +209,26 @@ case class CreateViewCommand(
       }
     
       /**
    -   * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize
    -   * SQL based on the analyzed plan, and also creates the proper schema for the view.
    +   * Returns a [[CatalogTable]] that can be used to save in the catalog. Generate the view-specific
    +   * properties(e.g. view default database, view query output column names) and store them as
    +   * properties in the CatalogTable, and also creates the proper schema for the view.
        */
    -  private def prepareTable(sparkSession: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = {
    -    val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
    -
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    +  private def prepareTable(session: SparkSession, aliasedPlan: LogicalPlan): CatalogTable = {
    +    if (originalText.isEmpty) {
    +      throw new AnalysisException(
    +        "It is not allowed to create a persisted view from the Dataset API")
         }
     
    +    val newProperties = generateViewProperties(properties, session, originalText.get)
    +
         CatalogTable(
           identifier = name,
           tableType = CatalogTableType.VIEW,
           storage = CatalogStorageFormat.empty,
           schema = aliasedPlan.schema,
    -      properties = properties,
    +      properties = newProperties,
           viewOriginalText = originalText,
    -      viewText = Some(viewSQL),
    +      viewText = originalText,
    --- End diff --
    
    I'm afraid that would require changes of several tens of places, should we do that in a seprated PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16613
  
    **[Test build #71507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71507/testReport)** for PR 16613 at commit [`9d582a4`](https://github.com/apache/spark/commit/9d582a453df92b2cc18df38acaaab0f621fac9c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16613#discussion_r96580275

--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,29 +210,35 @@ case class CreateViewCommand(
}

/**
- * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize
- * SQL based on the analyzed plan, and also creates the proper schema for the view.
+ * Returns a [[CatalogTable]] that can be used to save in the catalog. Generate the view-specific
+ * properties(e.g. view default database, view query output column names) and store them as
+ * properties in the CatalogTable, and also creates the proper schema for the view.
+ *
+ * @param session the spark session.
+ * @param aliasedPlan if `userSpecifiedColumns` is defined, the aliased plan outputs the user
+ * specified columns, else it is the same as the `analyzedPlan`.
+ * @param analyzedPlan the analyzed logical plan that represents the child of a view.
--- End diff --

why we need both `aliasedPlan` and `analyzedPlan`?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16613#discussion_r96580761

--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,29 +210,35 @@ case class CreateViewCommand(
}

We generate the `queryColumnNames` by `analyzedPlan`, and we generate the view schema by `aliasedPlan`, they are not the same when `userSpecifiedColumns` is defined.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96560759
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -275,21 +276,93 @@ case class AlterViewAsCommand(
           throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
         }
     
    -    val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
    -    // Validate the view SQL - make sure we can parse it and analyze it.
    -    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    -    try {
    -      session.sql(viewSQL).queryExecution.assertAnalyzed()
    -    } catch {
    -      case NonFatal(e) =>
    -        throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e)
    -    }
    +    val newProperties = generateViewProperties(viewMeta.properties, session, originalText)
     
         val updatedViewMeta = viewMeta.copy(
           schema = analyzedPlan.schema,
    +      properties = newProperties,
           viewOriginalText = Some(originalText),
    -      viewText = Some(viewSQL))
    +      viewText = Some(originalText))
     
         session.sessionState.catalog.alterTable(updatedViewMeta)
       }
     }
    +
    +object ViewHelper {
    +
    +  import CatalogTable._
    +
    +  /**
    +   * Generate the view default database in `properties`.
    +   */
    +  def generateViewDefaultDatabase(databaseName: String): Map[String, String] = {
    +    Map(VIEW_DEFAULT_DATABASE -> databaseName)
    +  }
    +
    +  /**
    +   * Generate the view query output column names in `properties`.
    +   */
    +  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = {
    +    val props = new mutable.HashMap[String, String]
    +    if (columns.nonEmpty) {
    +      props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
    +      columns.zipWithIndex.foreach { case (colName, index) =>
    +        props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
    +      }
    +    }
    +    props.toMap
    +  }
    +
    +  /**
    +   * Remove the view query output column names in `properties`.
    +   */
    +  def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = {
    +    // We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable,
    +    // while `CatalogTable` should be serializable.
    +    properties.filterNot { case (key, _) =>
    +      key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
    +    }
    +  }
    +
    +  /**
    +   * Generate the view properties in CatalogTable, including:
    +   * 1. view default database that is used to provide the default database name on view resolution.
    +   * 2. the output column names of the query that creates a view, this is used to map the output of
    +   *    the view child to the view output during view resolution.
    +   *
    +   * @param properties the `properties` in CatalogTable.
    +   * @param session the spark session.
    +   * @param viewText the query string used to create the child plan on view resolution.
    +   * @return new view properties including view default database and query column names properties.
    +   */
    +  def generateViewProperties(
    +      properties: Map[String, String],
    +      session: SparkSession,
    +      viewText: String): Map[String, String] = {
    +    // Try to analyze the viewText, throw an AnalysisException if the query is invalid.
    +    val queryPlan = try {
    +      val unresolvedPlan = session.sessionState.sqlParser.parsePlan(viewText)
    +      val resolvedPlan = session.sessionState.analyzer.execute(unresolvedPlan)
    +      session.sessionState.analyzer.checkAnalysis(resolvedPlan)
    +
    +      resolvedPlan
    +    } catch {
    +      case e: AnalysisException =>
    +        throw new AnalysisException(s"Failed to analyze the view query SQL: $viewText",
    +          cause = Some(e))
    +    }
    +
    +    // Generate the query column names, throw an AnalysisException if there exists duplicate column
    +    // names.
    +    val queryOutput = queryPlan.schema.fieldNames
    +    assert(queryOutput.toSet.size == queryOutput.size,
    --- End diff --
    
    nit: `queryOutput.distinct.size == queryOutput.size`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16613#discussion_r96587879
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala ---
    @@ -222,13 +222,14 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     
       test("correctly parse CREATE VIEW statement") {
    -    sql(
    -      """CREATE VIEW IF NOT EXISTS
    +    withView("testView") {
    +      sql(
    +        """CREATE VIEW IF NOT EXISTS
             |default.testView (c1 COMMENT 'blabla', c2 COMMENT 'blabla')
    --- End diff --
    
    nit: indention is wrong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org