You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2016/04/23 08:38:48 UTC

[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/12633

    [SPARK-14865][SQL] Better error handling for view creation.

    ## What changes were proposed in this pull request?
    This patch improves error handling in view creation. CreateViewCommand itself will analyze the view SQL query first, and if it cannot successfully analyze it, throw an AnalysisException.
    
    In addition, I also added the following two conservative guards for easier identification of Spark bugs:
    
    1. If there is a bug and the generated view SQL cannot be analyzed, throw an exception at runtime. Note that this is not an AnalysisException because it is not caused by the user and more likely indicate a bug in Spark.
    2. SQLBuilder when it gets an unresolved plan, it will also show the plan in the error message.
    
    I also took the chance to simplify the internal implementation of CreateViewCommand, and *removed* a fallback path that would've masked an exception from before.
    
    ## How was this patch tested?
    Updated unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-14865

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12633
    
----
commit 2d4ddc803e7676d8d5a778ce2a09f63217835b2b
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-04-23T06:34:28Z

    [SPARK-14865][SQL] Better error handling for view creation.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12633#discussion_r60832575
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -85,68 +88,74 @@ case class CreateViewCommand(
         } else {
           // Create the view if it doesn't exist.
           sessionState.catalog.createTable(
    -        prepareTable(sqlContext, analzyedPlan), ignoreIfExists = false)
    +        prepareTable(sqlContext, analyzedPlan), ignoreIfExists = false)
         }
     
         Seq.empty[Row]
       }
     
    -  private def prepareTable(sqlContext: SQLContext, analzyedPlan: LogicalPlan): CatalogTable = {
    -    val expandedText = if (sqlContext.conf.canonicalView) {
    -      try rebuildViewQueryString(sqlContext, analzyedPlan) catch {
    --- End diff --
    
    cc @liancheng @yhuai 
    
    I don't think we should ever catch an exception and silently fallback to some other codepath. Let's make sure we don't do that in the future ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213675764
  
    **[Test build #56785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56785/consoleFull)** for PR 12633 at commit [`2d4ddc8`](https://github.com/apache/spark/commit/2d4ddc803e7676d8d5a778ce2a09f63217835b2b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213690917
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56785/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213692659
  
    **[Test build #56786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56786/consoleFull)** for PR 12633 at commit [`9892ab5`](https://github.com/apache/spark/commit/9892ab5585fbc344ccbd656f38d3ab5edd04a36f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213690916
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213675496
  
    cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213692703
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213692704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56786/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213675544
  
    also cc @liancheng who wrote some of the original code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213823889
  
    LGTM. Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213676984
  
    **[Test build #56786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56786/consoleFull)** for PR 12633 at commit [`9892ab5`](https://github.com/apache/spark/commit/9892ab5585fbc344ccbd656f38d3ab5edd04a36f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12633#issuecomment-213690865
  
    **[Test build #56785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56785/consoleFull)** for PR 12633 at commit [`2d4ddc8`](https://github.com/apache/spark/commit/2d4ddc803e7676d8d5a778ce2a09f63217835b2b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12633


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14865][SQL] Better error handling for v...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12633#discussion_r60839961
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
    @@ -85,68 +88,74 @@ case class CreateViewCommand(
         } else {
           // Create the view if it doesn't exist.
           sessionState.catalog.createTable(
    -        prepareTable(sqlContext, analzyedPlan), ignoreIfExists = false)
    +        prepareTable(sqlContext, analyzedPlan), ignoreIfExists = false)
         }
     
         Seq.empty[Row]
       }
     
    -  private def prepareTable(sqlContext: SQLContext, analzyedPlan: LogicalPlan): CatalogTable = {
    -    val expandedText = if (sqlContext.conf.canonicalView) {
    -      try rebuildViewQueryString(sqlContext, analzyedPlan) catch {
    -        case NonFatal(e) => wrapViewTextWithSelect(analzyedPlan)
    +  /**
    +   * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize
    +   * SQL based on the analyzed plan, and also creates the proper schema for the view.
    +   */
    +  private def prepareTable(sqlContext: SQLContext, analyzedPlan: LogicalPlan): CatalogTable = {
    +    val viewSQL: String =
    +      if (sqlContext.conf.canonicalView) {
    +        val logicalPlan =
    +          if (tableDesc.schema.isEmpty) {
    +            analyzedPlan
    +          } else {
    +            val projectList = analyzedPlan.output.zip(tableDesc.schema).map {
    +              case (attr, col) => Alias(attr, col.name)()
    +            }
    +            sqlContext.executePlan(Project(projectList, analyzedPlan)).analyzed
    +          }
    +        new SQLBuilder(logicalPlan).toSQL
    +      } else {
    +        // When user specified column names for view, we should create a project to do the renaming.
    +        // When no column name specified, we still need to create a project to declare the columns
    +        // we need, to make us more robust to top level `*`s.
    +        val viewOutput = {
    +          val columnNames = analyzedPlan.output.map(f => quote(f.name))
    +          if (tableDesc.schema.isEmpty) {
    +            columnNames.mkString(", ")
    +          } else {
    +            columnNames.zip(tableDesc.schema.map(f => quote(f.name))).map {
    +              case (name, alias) => s"$name AS $alias"
    +            }.mkString(", ")
    +          }
    +        }
    +
    +        val viewText = tableDesc.viewText.get
    +        val viewName = quote(tableDesc.identifier.table)
    +        s"SELECT $viewOutput FROM ($viewText) $viewName"
           }
    -    } else {
    -      wrapViewTextWithSelect(analzyedPlan)
    +
    +    // Validate the view SQL - make sure we can parse it and analyze it.
    +    // If we cannot analyze the generated query, there is probably a bug in SQL generation.
    +    try {
    +      sqlContext.sql(viewSQL).queryExecution.assertAnalyzed()
    +    } catch {
    +      case NonFatal(e) =>
    +        throw new RuntimeException(
    +          "Failed to analyze the canonicalized SQL. It is possible there is a bug in Spark.", e)
         }
     
    -    val viewSchema = {
    +    val viewSchema: Seq[CatalogColumn] = {
           if (tableDesc.schema.isEmpty) {
    -        analzyedPlan.output.map { a =>
    +        analyzedPlan.output.map { a =>
               CatalogColumn(a.name, a.dataType.simpleString)
             }
           } else {
    -        analzyedPlan.output.zip(tableDesc.schema).map { case (a, col) =>
    +        analyzedPlan.output.zip(tableDesc.schema).map { case (a, col) =>
               CatalogColumn(col.name, a.dataType.simpleString, nullable = true, col.comment)
             }
           }
         }
     
    -    tableDesc.copy(schema = viewSchema, viewText = Some(expandedText))
    -  }
    -
    -  private def wrapViewTextWithSelect(analzyedPlan: LogicalPlan): String = {
    -    // When user specified column names for view, we should create a project to do the renaming.
    -    // When no column name specified, we still need to create a project to declare the columns
    -    // we need, to make us more robust to top level `*`s.
    -    val viewOutput = {
    -      val columnNames = analzyedPlan.output.map(f => quote(f.name))
    -      if (tableDesc.schema.isEmpty) {
    -        columnNames.mkString(", ")
    -      } else {
    -        columnNames.zip(tableDesc.schema.map(f => quote(f.name))).map {
    -          case (name, alias) => s"$name AS $alias"
    -        }.mkString(", ")
    -      }
    -    }
    -
    -    val viewText = tableDesc.viewText.get
    -    val viewName = quote(tableDesc.identifier.table)
    -    s"SELECT $viewOutput FROM ($viewText) $viewName"
    -  }
    -
    -  private def rebuildViewQueryString(sqlContext: SQLContext, analzyedPlan: LogicalPlan): String = {
    -    val logicalPlan = if (tableDesc.schema.isEmpty) {
    -      analzyedPlan
    -    } else {
    -      val projectList = analzyedPlan.output.zip(tableDesc.schema).map {
    -        case (attr, col) => Alias(attr, col.name)()
    -      }
    -      sqlContext.executePlan(Project(projectList, analzyedPlan)).analyzed
    -    }
    -    new SQLBuilder(logicalPlan).toSQL
    +    tableDesc.copy(schema = viewSchema, viewText = Some(viewSQL))
    --- End diff --
    
    Everything looks great to me. I only have a comment that is not directly related to this PR.
    
    It looks a little bit strange to generate a partial `CatalogTable` for a `view` in [`createView`](https://github.com/apache/spark/blob/c06110187b3e41405fc13aba367abdd4183ed9a6/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L1164). When we describe this parameter `tableDesc` in `CreateViewCommand`, we said it is `the catalog table`. Actually, this is not right. Then, here, we generate the final `CatalogTable` by updating two of the values `schema` and `viewText` with the correct values. 
    
    Anyway, this is not a big deal at all. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org