You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jiangxb1987 <gi...@git.apache.org> on 2017/01/12 10:27:08 UTC

[GitHub] spark pull request #16561: [SPARK-18209][SQL] Alias the view with its child ...

GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/16561

    [SPARK-18209][SQL] Alias the view with its child by mapping the columns by index

    ## What changes were proposed in this pull request?
    
    This PR is a follow-up to address the comments https://github.com/apache/spark/pull/16233/files#r95669988 and https://github.com/apache/spark/pull/16233/files#r95662299.
    
    We should alias the view with its child by mapping the columns by index, which follows the behavior in Hive. If the dataType of the outputs from the view and the child don't match up, we should throw an Exception.
    
    ## How was this patch tested?
    
    Add new test cases in `SQLViewSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark alias-view

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16561.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16561
    
----
commit 0e823409f1ff4689883408992c5a3a739d39a8d2
Author: jiangxingbo <ji...@gmail.com>
Date:   2017-01-12T10:06:04Z

    alias view with child by map the columns by index.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18209][SQL] Alias the view with its child by mapp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18209][SQL] Alias the view with its child ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r95817334
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
      * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * with a Project and add an alias for each output attribute by mapping the child output by index,
    + * if the view output doesn't have the same number of columns with the child output, throw an
    + * AnalysisException.
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
         case v @ View(_, output, child) if child.resolved =>
    -      val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      if (output.length != child.output.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      val newOutput = output.zip(child.output).map {
    +        case (attr, originAttr) =>
    +          if (attr.dataType != originAttr.dataType) {
    --- End diff --
    
    can you check hive's behavior? maybe we can use `UpCast` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71416/testReport)** for PR 16561 at commit [`16ec310`](https://github.com/apache/spark/commit/16ec310d96471579af916716f6c99df60fd20bc5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96159390
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -198,9 +203,44 @@ case class CatalogTable(
     
       /**
        * Return the default database name we use to resolve a view, should be None if the CatalogTable
    -   * is not a View.
    +   * is not a View or created by older versions of Spark(before 2.2.0).
    +   */
    +  def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE)
    +
    +  /**
    +   * Return the output column names of the query that creates a view, the column names are used to
    +   * resolve a view, should be None if the CatalogTable is not a View or created by older versions
    +   * of Spark(before 2.2.0).
    +   */
    +  def viewQueryColumnNames: Seq[String] = {
    +    for {
    +      numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq
    --- End diff --
    
    It is needed to generate the correct output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71416/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96313495
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,60 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name(This is mostly for handling view queries like SELECT * FROM ..., the
    + *         schema of the referenced table/view may change after the view has been created, so we
    + *         have to save the output of the query to `viewQueryColumnNames`, and restore them during
    + *         view resolution, in this way, we are able to get the correct view column ordering and
    + *         omit the extra columns that we don't require);
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved && output != child.output =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      val queryOutput = if (queryColumnNames.nonEmpty) {
    +        // If the view output doesn't have the same number of columns with the query column names,
    +        // throw an AnalysisException.
    +        if (output.length != queryColumnNames.length) {
    +          throw new AnalysisException(
    +            s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +              s"columns with the query column names ${queryColumnNames.mkString("[", ",", "]")}")
    +        }
    +        desc.viewQueryColumnNames.map { colName =>
    +          findAttributeByName(colName, child.output, resolver)
    +        }
    +      } else {
    +        // For view created before Spark 2.2.0, the view text is already fully qualified, the plan
    +        // output is the same with the view output.
    +        child.output
    +      }
    +      // Map the attributes in the query output to the attributes in the view output by index.
    +      val newOutput = output.zip(queryOutput).map {
    --- End diff --
    
    Seems we need to check the size of `output` and `queryOutput`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96331626
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,60 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name(This is mostly for handling view queries like SELECT * FROM ..., the
    + *         schema of the referenced table/view may change after the view has been created, so we
    + *         have to save the output of the query to `viewQueryColumnNames`, and restore them during
    + *         view resolution, in this way, we are able to get the correct view column ordering and
    + *         omit the extra columns that we don't require);
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved && output != child.output =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      val queryOutput = if (queryColumnNames.nonEmpty) {
    +        // If the view output doesn't have the same number of columns with the query column names,
    +        // throw an AnalysisException.
    +        if (output.length != queryColumnNames.length) {
    +          throw new AnalysisException(
    +            s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +              s"columns with the query column names ${queryColumnNames.mkString("[", ",", "]")}")
    +        }
    +        desc.viewQueryColumnNames.map { colName =>
    +          findAttributeByName(colName, child.output, resolver)
    +        }
    +      } else {
    +        // For view created before Spark 2.2.0, the view text is already fully qualified, the plan
    +        // output is the same with the view output.
    +        child.output
    +      }
    +      // Map the attributes in the query output to the attributes in the view output by index.
    +      val newOutput = output.zip(queryOutput).map {
    --- End diff --
    
    For views created by older versions of Spark, the view text is fully qualified, so the output is the same with the view output. Or else we have checked that the output have the same length with `queryColumnNames`. So perhaps we don't need to check the size of `output` and `queryOutput` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71400/testReport)** for PR 16561 at commit [`7e19803`](https://github.com/apache/spark/commit/7e19803f6b24ae0b1143951d1d421ead4612f36f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71400/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71426/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96157834
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala ---
    @@ -680,21 +700,70 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         }
       }
     
    -  test("correctly handle type casting between view output and child output") {
    +  test("resolve a view with custom column names") {
         withTable("testTable") {
    +      spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable")
           withView("testView") {
    -        spark.range(1, 10).toDF("id1").write.format("json").saveAsTable("testTable")
    -        sql("CREATE VIEW testView AS SELECT * FROM testTable")
    +        val testView = CatalogTable(
    +          identifier = TableIdentifier("testView"),
    +          tableType = CatalogTableType.VIEW,
    +          storage = CatalogStorageFormat.empty,
    +          schema = new StructType().add("x", "long").add("y", "long"),
    --- End diff --
    
    let's think about how we will persistent view when we have custom column names. Ideally we will have a logical plan representing the view, a SQL statement of the view query, and a `Seq[String]` for the custom column names.
    
    1. call `plan.schema` to get the view schema, and zip it with custom column names, to get the final schema and save it. Then use `plan.schema.map(_.name)` to generate the `VIEW_QUERY_OUTPUT_COLUMN_NAME` in table properties.
    2. call `plan.schema` to get the view schema, and save it as the final schema. Then use custom column names to generate the `VIEW_QUERY_OUTPUT_COLUMN_NAME` in table properties.
    
    Personally I think option 2 is more natural, what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16561


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96157931
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala ---
    @@ -680,21 +700,70 @@ class SQLViewSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         }
       }
     
    -  test("correctly handle type casting between view output and child output") {
    +  test("resolve a view with custom column names") {
         withTable("testTable") {
    +      spark.range(1, 10).selectExpr("id", "id + 1 id1").write.saveAsTable("testTable")
           withView("testView") {
    -        spark.range(1, 10).toDF("id1").write.format("json").saveAsTable("testTable")
    -        sql("CREATE VIEW testView AS SELECT * FROM testTable")
    +        val testView = CatalogTable(
    +          identifier = TableIdentifier("testView"),
    +          tableType = CatalogTableType.VIEW,
    +          storage = CatalogStorageFormat.empty,
    +          schema = new StructType().add("x", "long").add("y", "long"),
    +          viewOriginalText = Some("SELECT * FROM testTable"),
    +          viewText = Some("SELECT * FROM testTable"),
    +          properties = Map(CatalogTable.VIEW_DEFAULT_DATABASE -> "default",
    +            CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NUM -> "2",
    +            s"${CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX}0" -> "id",
    +            s"${CatalogTable.VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX}1" -> "id1"))
    +        hiveContext.sessionState.catalog.createTable(testView, ignoreIfExists = false)
    +
    +        // Correctly resolve a view with custom column names.
    +        checkAnswer(sql("SELECT * FROM testView ORDER BY x"), (1 to 9).map(i => Row(i, i + 1)))
    --- End diff --
    
    can we use `select x, y ...`? to test the custom column names really work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r95947533
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
      * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * with a Project and add an alias for each output attribute by mapping the child output by index,
    + * if the view output doesn't have the same number of columns with the child output, throw an
    + * AnalysisException.
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
         case v @ View(_, output, child) if child.resolved =>
    -      val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      if (output.length != child.output.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      val newOutput = output.zip(child.output).map {
    +        case (attr, originAttr) =>
    +          if (attr.dataType != originAttr.dataType) {
    --- End diff --
    
    cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96157263
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -254,6 +294,9 @@ case class CatalogTable(
     
     object CatalogTable {
       val VIEW_DEFAULT_DATABASE = "view.default.database"
    +  val VIEW_QUERY_OUTPUT_PREFIX = "view.query.out."
    +  val VIEW_QUERY_OUTPUT_COLUMN_NUM = VIEW_QUERY_OUTPUT_PREFIX + "numCols"
    --- End diff --
    
    nit: `xxx_NUM_COLUMNS`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by QQshu1 <gi...@git.apache.org>.
Github user QQshu1 commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    hi , I have a question, why we should Eliminate View in the first of the optimizer.?
      thank you.@jiangxb1987


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18209][SQL] Alias the view with its child by mapp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71255 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71255/testReport)** for PR 16561 at commit [`0e82340`](https://github.com/apache/spark/commit/0e823409f1ff4689883408992c5a3a739d39a8d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18209][SQL] Alias the view with its child by mapp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71255/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71415 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71415/testReport)** for PR 16561 at commit [`d6537a5`](https://github.com/apache/spark/commit/d6537a5d66ba88cb827a6fc84a7ec8be79af5277).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71415 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71415/testReport)** for PR 16561 at commit [`d6537a5`](https://github.com/apache/spark/commit/d6537a5d66ba88cb827a6fc84a7ec8be79af5277).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71437/testReport)** for PR 16561 at commit [`06e8855`](https://github.com/apache/spark/commit/06e8855fef77a2a14d96f6895dcb6e915a422722).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96157302
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -198,9 +203,44 @@ case class CatalogTable(
     
       /**
        * Return the default database name we use to resolve a view, should be None if the CatalogTable
    -   * is not a View.
    +   * is not a View or created by older versions of Spark(before 2.2.0).
    +   */
    +  def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE)
    +
    +  /**
    +   * Return the output column names of the query that creates a view, the column names are used to
    +   * resolve a view, should be None if the CatalogTable is not a View or created by older versions
    +   * of Spark(before 2.2.0).
    +   */
    +  def viewQueryColumnNames: Seq[String] = {
    +    for {
    +      numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq
    --- End diff --
    
    `.toSeq` is not needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71434/testReport)** for PR 16561 at commit [`c86ab48`](https://github.com/apache/spark/commit/c86ab48c90fe1b9185021b37484a8d0a45f9a18e).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71400/testReport)** for PR 16561 at commit [`7e19803`](https://github.com/apache/spark/commit/7e19803f6b24ae0b1143951d1d421ead4612f36f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71416/testReport)** for PR 16561 at commit [`16ec310`](https://github.com/apache/spark/commit/16ec310d96471579af916716f6c99df60fd20bc5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96157156
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -198,9 +203,44 @@ case class CatalogTable(
     
       /**
        * Return the default database name we use to resolve a view, should be None if the CatalogTable
    -   * is not a View.
    +   * is not a View or created by older versions of Spark(before 2.2.0).
    +   */
    +  def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE)
    +
    +  /**
    +   * Return the output column names of the query that creates a view, the column names are used to
    +   * resolve a view, should be None if the CatalogTable is not a View or created by older versions
    --- End diff --
    
    should be `Nil`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r95947257
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
      * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * with a Project and add an alias for each output attribute by mapping the child output by index,
    + * if the view output doesn't have the same number of columns with the child output, throw an
    + * AnalysisException.
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
         case v @ View(_, output, child) if child.resolved =>
    -      val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      if (output.length != child.output.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      val newOutput = output.zip(child.output).map {
    +        case (attr, originAttr) =>
    +          if (attr.dataType != originAttr.dataType) {
    --- End diff --
    
    It sounds like Hive just forcefully cast it.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18209][SQL] Alias the view with its child ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r95823820
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
      * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * with a Project and add an alias for each output attribute by mapping the child output by index,
    + * if the view output doesn't have the same number of columns with the child output, throw an
    + * AnalysisException.
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
         case v @ View(_, output, child) if child.resolved =>
    -      val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      if (output.length != child.output.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      val newOutput = output.zip(child.output).map {
    +        case (attr, originAttr) =>
    +          if (attr.dataType != originAttr.dataType) {
    --- End diff --
    
    Seems that Hive supports UpCast between child output and view output, for example:
    ```
    hive> create table testtable as select 1 a, 2L b;
    hive> create view testview as select * from testtable;
    hive> select * from testview;
    OK
    1	2
    Time taken: 0.11 seconds, Fetched: 1 row(s)
    hive> alter table testtable change column a a bigint;
    hive> alter table testtable change column b b string;
    hive> desc testtable;
    OK
    a                   	bigint              	                    
    b                   	string              	                    
    Time taken: 0.15 seconds, Fetched: 2 row(s)
    hive> desc testview;
    OK
    a                   	int                 	                    
    b                   	bigint              	                    
    Time taken: 0.038 seconds, Fetched: 2 row(s)
    hive> select * from testview;
    OK
    1	2
    Time taken: 0.172 seconds, Fetched: 1 row(s)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96168283
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name;
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      // If the view output doesn't have the same number of columns with the child output and the
    +      // query column names, throw an AnalysisException.
    +      if (output.length != child.output.length && output.length != queryColumnNames.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      // If the child output is the same with the view output, we don't need to generate the query
    +      // output again.
    +      val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) {
    --- End diff --
    
    `output != child.output` will always be true right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71437/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    @QQshu1 As we have mentioned in the comment, the `View` operator is respected till the end of analysis stage to enable us better understand the analyzed logical plan. On the Beginning of optimize stage, that operator is no longer needed, so we apply the `EliminateView` rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71437/testReport)** for PR 16561 at commit [`06e8855`](https://github.com/apache/spark/commit/06e8855fef77a2a14d96f6895dcb6e915a422722).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96175295
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -89,6 +89,25 @@ object Cast {
         case _ => false
       }
     
    +  /**
    +   * Return false iff we may truncate during casting `from` type to `to` type. e.g. long -> int,
    +   * timestamp -> date.
    +   */
    +  def canUpCast(from: DataType, to: DataType): Boolean = (from, to) match {
    --- End diff --
    
    how about `def mayTruncate`? `canUpCast` is not accurate, we may not be able to cast even `canUpCast` returns true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71415/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r95947477
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
      * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * with a Project and add an alias for each output attribute by mapping the child output by index,
    + * if the view output doesn't have the same number of columns with the child output, throw an
    + * AnalysisException.
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
         case v @ View(_, output, child) if child.resolved =>
    -      val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      if (output.length != child.output.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      val newOutput = output.zip(child.output).map {
    +        case (attr, originAttr) =>
    +          if (attr.dataType != originAttr.dataType) {
    --- End diff --
    
    ```
    hive> explain extended select * from testview;
    OK
    ABSTRACT SYNTAX TREE:
      
    TOK_QUERY
       TOK_FROM
          TOK_TABREF
             TOK_TABNAME
                testview
       TOK_INSERT
          TOK_DESTINATION
             TOK_DIR
                TOK_TMP_FILE
          TOK_SELECT
             TOK_SELEXPR
                TOK_ALLCOLREF
    
    
    STAGE DEPENDENCIES:
      Stage-0 is a root stage
    
    STAGE PLANS:
      Stage: Stage-0
        Fetch Operator
          limit: -1
          Processor Tree:
            TableScan
              alias: testtable
              Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: NONE
              GatherStats: false
              Select Operator
                expressions: a (type: bigint), b (type: tinyint)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: NONE
                ListSink
    ```
    
    **`expressions: a (type: bigint), b (type: tinyint)`**. I tried to alter the columns in the underlying tables to different types. I can see the types of view columns are always casted to the same one as the altered one
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18209][SQL] Alias the view with its child ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r95826222
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
      * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * with a Project and add an alias for each output attribute by mapping the child output by index,
    + * if the view output doesn't have the same number of columns with the child output, throw an
    + * AnalysisException.
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
         case v @ View(_, output, child) if child.resolved =>
    -      val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      if (output.length != child.output.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      val newOutput = output.zip(child.output).map {
    +        case (attr, originAttr) =>
    +          if (attr.dataType != originAttr.dataType) {
    --- End diff --
    
    What should we set for the `walkedTypePath` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71426/testReport)** for PR 16561 at commit [`21e63f8`](https://github.com/apache/spark/commit/21e63f8eb0540ff26c16804bffb222123a97c1c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96168755
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name;
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      // If the view output doesn't have the same number of columns with the child output and the
    +      // query column names, throw an AnalysisException.
    +      if (output.length != child.output.length && output.length != queryColumnNames.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      // If the child output is the same with the view output, we don't need to generate the query
    +      // output again.
    +      val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) {
    --- End diff --
    
    Oh I think that's better!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96157384
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -198,9 +203,44 @@ case class CatalogTable(
     
       /**
        * Return the default database name we use to resolve a view, should be None if the CatalogTable
    -   * is not a View.
    +   * is not a View or created by older versions of Spark(before 2.2.0).
    +   */
    +  def viewDefaultDatabase: Option[String] = properties.get(VIEW_DEFAULT_DATABASE)
    +
    +  /**
    +   * Return the output column names of the query that creates a view, the column names are used to
    +   * resolve a view, should be None if the CatalogTable is not a View or created by older versions
    +   * of Spark(before 2.2.0).
    +   */
    +  def viewQueryColumnNames: Seq[String] = {
    +    for {
    +      numCols <- properties.get(VIEW_QUERY_OUTPUT_COLUMN_NUM).toSeq
    +      index <- 0 until numCols.toInt
    +    } yield properties.getOrElse(
    +      s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index",
    +      throw new AnalysisException("Corrupted view query output column names in catalog: " +
    +        s"$numCols parts expected, but part $index is missing.")
    +    )
    +  }
    +
    +  /**
    +   * Insert/Update the view query output column names in `properties`.
        */
    -  def viewDefaultDatabase: Option[String] = properties.get(CatalogTable.VIEW_DEFAULT_DATABASE)
    +  def withQueryColumnNames(columns: Seq[String]): CatalogTable = {
    +    val props = new mutable.HashMap[String, String]
    --- End diff --
    
    let's follow the existing code for partition columns
    ```
    properties.put(DATASOURCE_SCHEMA_NUMPARTCOLS, partitionColumns.length.toString)
          partitionColumns.zipWithIndex.foreach { case (partCol, index) =>
            properties.put(s"$DATASOURCE_SCHEMA_PARTCOL_PREFIX$index", partCol)
          }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71434/testReport)** for PR 16561 at commit [`c86ab48`](https://github.com/apache/spark/commit/c86ab48c90fe1b9185021b37484a8d0a45f9a18e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96167779
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name;
    --- End diff --
    
    should we mention that, this is mostly for `SELECT * ...`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96168225
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name;
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      // If the view output doesn't have the same number of columns with the child output and the
    +      // query column names, throw an AnalysisException.
    +      if (output.length != child.output.length && output.length != queryColumnNames.length) {
    --- End diff --
    
    This condition doesn't look very clear to me. How about `if (queryColumnNames.nonEmpty && output.length != queryColumnNames.length)`? When `queryColumnNames` is empty, it means this view is created prior to Spark 2.2, and we don't need to check anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    the `View` operator doesn't have a corresponding physical operator, we have to remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18209][SQL] Alias the view with its child by mapp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    **[Test build #71255 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71255/testReport)** for PR 16561 at commit [`0e82340`](https://github.com/apache/spark/commit/0e823409f1ff4689883408992c5a3a739d39a8d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96168416
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name;
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      // If the view output doesn't have the same number of columns with the child output and the
    +      // query column names, throw an AnalysisException.
    +      if (output.length != child.output.length && output.length != queryColumnNames.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      // If the child output is the same with the view output, we don't need to generate the query
    +      // output again.
    +      val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) {
    --- End diff --
    
    For a nested view, the inner view operator may have been resolved, in that case the output is the same with child.output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71434/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with its chi...

Posted by QQshu1 <gi...@git.apache.org>.
Github user QQshu1 commented on the issue:

    https://github.com/apache/spark/pull/16561
  
    thanks, What effect if we don`t Eliminate View?I  means whether it effect  optimize the tree namely performance or the  correctness of results ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96158153
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name;
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      // If the view output doesn't have the same number of columns either with the child output,
    +      // or with the query column names, throw an AnalysisException.
    +      if (output.length != child.output.length && output.length != queryColumnNames.length) {
    --- End diff --
    
    the comment says `or` but the code use `&&`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16561: [SPARK-18801][SQL][FOLLOWUP] Alias the view with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16561#discussion_r96168453
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -28,22 +28,56 @@ import org.apache.spark.sql.catalyst.rules.Rule
      */
     
     /**
    - * Make sure that a view's child plan produces the view's output attributes. We wrap the child
    - * with a Project and add an alias for each output attribute. The attributes are resolved by
    - * name. This should be only done after the batch of Resolution, because the view attributes are
    - * not completely resolved during the batch of Resolution.
    + * Make sure that a view's child plan produces the view's output attributes. We try to wrap the
    + * child by:
    + * 1. Generate the `queryOutput` by:
    + *    1.1. If the query column names are defined, map the column names to attributes in the child
    + *         output by name;
    + *    1.2. Else set the child output attributes to `queryOutput`.
    + * 2. Map the `queryQutput` to view output by index, if the corresponding attributes don't match,
    + *    try to up cast and alias the attribute in `queryOutput` to the attribute in the view output.
    + * 3. Add a Project over the child, with the new output generated by the previous steps.
    + * If the view output doesn't have the same number of columns neither with the child output, nor
    + * with the query column names, throw an AnalysisException.
    + *
    + * This should be only done after the batch of Resolution, because the view attributes are not
    + * completely resolved during the batch of Resolution.
      */
     case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
       override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    -    case v @ View(_, output, child) if child.resolved =>
    +    case v @ View(desc, output, child) if child.resolved =>
           val resolver = conf.resolver
    -      val newOutput = output.map { attr =>
    -        val originAttr = findAttributeByName(attr.name, child.output, resolver)
    -        // The dataType of the output attributes may be not the same with that of the view output,
    -        // so we should cast the attribute to the dataType of the view output attribute. If the
    -        // cast can't perform, will throw an AnalysisException.
    -        Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = attr.exprId,
    -          qualifier = attr.qualifier, explicitMetadata = Some(attr.metadata))
    +      val queryColumnNames = desc.viewQueryColumnNames
    +      // If the view output doesn't have the same number of columns with the child output and the
    +      // query column names, throw an AnalysisException.
    +      if (output.length != child.output.length && output.length != queryColumnNames.length) {
    +        throw new AnalysisException(
    +          s"The view output ${output.mkString("[", ",", "]")} doesn't have the same number of " +
    +            s"columns with the child output ${child.output.mkString("[", ",", "]")}")
    +      }
    +      // If the child output is the same with the view output, we don't need to generate the query
    +      // output again.
    +      val queryOutput = if (queryColumnNames.nonEmpty && output != child.output) {
    +        desc.viewQueryColumnNames.map { colName =>
    +          findAttributeByName(colName, child.output, resolver)
    +        }
    +      } else {
    +        child.output
    +      }
    --- End diff --
    
    how about
    ```
    val queryOutput = if (queryColumnNames.nonEmpty) {
      if (output.length != queryColumnNames.length) throw ...
      desc.viewQueryColumnNames.map { colName =>
        findAttributeByName(colName, child.output, resolver)
      }
    } else {
      // For view created before Spark 2.1, the view text is already fully qualified, the plan output is view output.
      child.output
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org