You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/07/12 04:03:42 UTC

[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/14148

    [SPARK-16482] [SQL] Describe Table Command for Tables Requiring Runtime Inferred Schema 

    #### What changes were proposed in this pull request?
    If we create a table pointing to a parquet/json datasets without specifying the schema, describe table command does not show the schema at all. It only shows `# Schema of this table is inferred at runtime`. In 1.6, describe table does show the schema of such a table.
    
    For data source tables, to infer the schema, we need to load the data source tables at runtime. Thus, this PR calls the function `lookupRelation`.
    
    #### How was this patch tested?
    Added test cases

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark describeSchema

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14148
    
----
commit 57893bdf55146c4ecd0a6d72c69ec3d3e85b5207
Author: gatorsmile <ga...@gmail.com>
Date:   2016-07-11T22:30:11Z

    fix

commit 6f2deb3405b119aff1c88cab19d3953a7ede0408
Author: gatorsmile <ga...@gmail.com>
Date:   2016-07-11T22:55:18Z

    another fix way

commit d92ebcdfd7e525499e0c8b491eeab416ad12ecfd
Author: gatorsmile <ga...@gmail.com>
Date:   2016-07-12T04:00:20Z

    another fix way

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    I was not talking about caching here. Caching is transient. I want the behavior to be the same regardless of how many times I'm restarting Spark ...
    
    And this has nothing to do with refresh. For tables in the catalog, NEVER change the schema implicitly, only do it when it is specified by the user.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14148


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62141/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14148#discussion_r70578153
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -413,38 +413,36 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
         } else {
           val metadata = catalog.getTableMetadata(table)
     
    +      if (DDLUtils.isDatasourceTable(metadata)) {
    +        DDLUtils.getSchemaFromTableProperties(metadata) match {
    +          case Some(userSpecifiedSchema) => describeSchema(userSpecifiedSchema, result)
    +          case None => describeSchema(catalog.lookupRelation(table).schema, result)
    +        }
    +      } else {
    +        describeSchema(metadata.schema, result)
    +      }
    --- End diff --
    
    @yhuai I just did a try. We have to pass `CatalogTable` for avoiding another call of `getTableMetadata`. We also need to pass `SessionCatalog` for calling `lookupRelation`. Do you like this function? or keep the existing one? Thanks!
    
    ```Scala
      private def describeSchema(
          tableDesc: CatalogTable,
          catalog: SessionCatalog,
          buffer: ArrayBuffer[Row]): Unit = {
        if (DDLUtils.isDatasourceTable(tableDesc)) {
          DDLUtils.getSchemaFromTableProperties(tableDesc) match {
            case Some(userSpecifiedSchema) => describeSchema(userSpecifiedSchema, buffer)
            case None => describeSchema(catalog.lookupRelation(table).schema, buffer)
          }
        } else {
          describeSchema(tableDesc.schema, buffer)
        }
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    ```
        Seq("parquet", "json", "orc").foreach { fileFormat =>
          withTable("t1") {
            withTempPath { dir =>
              val path = dir.getCanonicalPath
              spark.range(1).write.format(fileFormat).save(path)
              sql(s"CREATE TABLE t1(a int, b int) USING $fileFormat OPTIONS (PATH '$path') ")
              sql("select * from t1").show(false)
            }
          }
        }
    ```
    
    If users specify an unmatched schema, we did not do the check. I think at least we should report an error as early as possible. For the formats `parquet` and `json`, the outputs are 
    ```
    +----+----+
    |a   |b   |
    +----+----+
    |null|null|
    +----+----+
    ```
    For the format `orc`, we got an stage failure. 
    ```
    Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost): java.lang.IllegalArgumentException: Field "a" does not exist.
    ```
    
    Because this is another issue, will submit a separate PR for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62141/consoleFull)** for PR 14148 at commit [`d92ebcd`](https://github.com/apache/spark/commit/d92ebcdfd7e525499e0c8b491eeab416ad12ecfd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62143/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Thanks. Just FYI when you make future changes, when a table is added to the catalog (regardless whether it is temporary, non-temp, external, internal), we should save its schema. We should not rely on schema inference every time the user runs a query, and the schema should not change depending on time or the underlying data. For tables in the catalog, schema should be specified by the user. It is OK as a convenience measure for user to rely on schema inference during table creation, but it is not OK to rely on schema inference every time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Many interesting observation after further investigation. Will post the findings tonight. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Did a quick check. My understanding is wrong. We did the schema inference when creating the table. Let me fix it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    @rxin The created table could be empty. Thus, we are unable to cover all the cases even if we try schema inference when creating tables. You know, this is just my understanding. No clue about the original intention. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    @rxin The failed test case is interesting! `REFRESH TABLE` command does not refresh the metadata stored in the external catalog. When the tables are data source tables, it is a bug? 
    
    Please let me know if this is by design. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    It's easy to infer the schema once when we create the table and store it into external catalog. However, it's a breaking change which means users can't change the underlying data file schema after the table is created. It's a bad design we need to fix, but we also need to go through the code path to make sure we don't break other things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Tomorrow, I will try to dig it deeper and check whether schema evolution could be an issue if the schema is fixed when creating tables. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14148#discussion_r70570489
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -431,7 +431,7 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
           val schema = DDLUtils.getSchemaFromTableProperties(table)
     
           if (schema.isEmpty) {
    -        append(buffer, "# Schema of this table is inferred at runtime", "", "")
    +        append(buffer, "# Schema of this table in catalog is corrupted", "", "")
    --- End diff --
    
    Should we just use `catalog.lookupRelation(table).schema` to get the schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    LGTM. Merging to master and branch 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Shouldn't schema inference run as soon as the table is created?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    @cloud-fan, @gatorsmile, and @yhuai - how difficult would it be to change Spark so that it runs schema inference during table creation, and saves the table schema when we create the table? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62144/consoleFull)** for PR 14148 at commit [`a05383c`](https://github.com/apache/spark/commit/a05383c8ff4483dacdf34070173b965ab6f7d4ca).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14148#discussion_r70573373
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -413,38 +413,36 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
         } else {
           val metadata = catalog.getTableMetadata(table)
     
    +      if (DDLUtils.isDatasourceTable(metadata)) {
    +        DDLUtils.getSchemaFromTableProperties(metadata) match {
    +          case Some(userSpecifiedSchema) => describeSchema(userSpecifiedSchema, result)
    +          case None => describeSchema(catalog.lookupRelation(table).schema, result)
    +        }
    +      } else {
    +        describeSchema(metadata.schema, result)
    +      }
    --- End diff --
    
    Sure. Let me do it now
    
    BTW, previously, `describeExtended` and `describeFormatted` also contain the schema. Both call the original function `describe`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14148#discussion_r70570674
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -431,7 +431,7 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
           val schema = DDLUtils.getSchemaFromTableProperties(table)
     
           if (schema.isEmpty) {
    -        append(buffer, "# Schema of this table is inferred at runtime", "", "")
    +        append(buffer, "# Schema of this table in catalog is corrupted", "", "")
    --- End diff --
    
    Do you like the last patch? https://github.com/apache/spark/pull/14148/commits/d92ebcdfd7e525499e0c8b491eeab416ad12ecfd


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62217/consoleFull)** for PR 14148 at commit [`d92ebcd`](https://github.com/apache/spark/commit/d92ebcdfd7e525499e0c8b491eeab416ad12ecfd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14148#discussion_r70571914
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -413,38 +413,36 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
         } else {
           val metadata = catalog.getTableMetadata(table)
     
    +      if (DDLUtils.isDatasourceTable(metadata)) {
    +        DDLUtils.getSchemaFromTableProperties(metadata) match {
    +          case Some(userSpecifiedSchema) => describeSchema(userSpecifiedSchema, result)
    +          case None => describeSchema(catalog.lookupRelation(table).schema, result)
    +        }
    +      } else {
    +        describeSchema(metadata.schema, result)
    +      }
    --- End diff --
    
    How about we try to put these into describeSchema? Of, maybe we can add a `describeSchema(tableName, result)`? Seems it is weird that `describeExtended` and `describeFormatted` do not contain the code for describing the schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62217/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    uh... I see what you mean. Agree. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62144/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    @rxin Currently, we do not run schema inference every time when metadata cache contains the plan. Based on my understanding, that is the major reason why we introduced the metadata cache at the very beginning. 
    
    I think it is not hard to store the schema of data source tables in the external catalog (Hive metastore). However, `Refresh Table` only refreshes the metadata cache and the data cache. It does not update the schema stored in the external catalog. If we do not store the schema in the external catalog, it works well. Otherwise, we have to refresh the schema info in the external catalog.
    
    To implement your idea, I can submit a PR for the release 2.1 tomorrow. We can discuss it in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62144/consoleFull)** for PR 14148 at commit [`a05383c`](https://github.com/apache/spark/commit/a05383c8ff4483dacdf34070173b965ab6f7d4ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62217/consoleFull)** for PR 14148 at commit [`d92ebcd`](https://github.com/apache/spark/commit/d92ebcdfd7e525499e0c8b491eeab416ad12ecfd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14148#discussion_r70570551
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
    @@ -105,7 +105,7 @@ case class CreateDataSourceTableCommand(
         CreateDataSourceTableUtils.createDataSourceTable(
           sparkSession = sparkSession,
           tableIdent = tableIdent,
    -      userSpecifiedSchema = userSpecifiedSchema,
    +      userSpecifiedSchema = Some(dataSource.schema),
    --- End diff --
    
    I think this change is risky for 2.0 and it is changing the behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62143/consoleFull)** for PR 14148 at commit [`473b27d`](https://github.com/apache/spark/commit/473b27deeb49096ddd38f1b4d4ca03207aa9e025).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14148#discussion_r70570710
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---
    @@ -105,7 +105,7 @@ case class CreateDataSourceTableCommand(
         CreateDataSourceTableUtils.createDataSourceTable(
           sparkSession = sparkSession,
           tableIdent = tableIdent,
    -      userSpecifiedSchema = userSpecifiedSchema,
    +      userSpecifiedSchema = Some(dataSource.schema),
    --- End diff --
    
    Agree. I will revert it to the last solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62141/consoleFull)** for PR 14148 at commit [`d92ebcd`](https://github.com/apache/spark/commit/d92ebcdfd7e525499e0c8b491eeab416ad12ecfd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    **[Test build #62143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62143/consoleFull)** for PR 14148 at commit [`473b27d`](https://github.com/apache/spark/commit/473b27deeb49096ddd38f1b4d4ca03207aa9e025).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    Also cc @huaiy @cloud-fan @liancheng @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14148
  
    @rxin @cloud-fan @yhuai Will do more investigation and submit a separate PR for solution review. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org