You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xwu0226 <gi...@git.apache.org> on 2016/04/21 19:41:03 UTC

[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

GitHub user xwu0226 opened a pull request:

    https://github.com/apache/spark/pull/12579

    [SPARK-14346][SQL] Show Create Table (Native)

    This is a rebased version of [#12132](https://github.com/apache/spark/pull/12132) and [#12406](https://github.com/apache/spark/pull/12406)
    
    ## What changes were proposed in this pull request?
    Allow users to issue "`SHOW CREATE TABLE`" command natively in SparkSQL. 
    -- For tables that are created by Hive, this command will display the DDL in hive syntax. If the syntax includes `CLUSTERED BY, SKEWED BY or STORED BY` clause, there will be a warning message saying that this DDL is not supported in SparkSQL native DDL yet. 
    
    -- For tables that are created by datasource DDL, such as "`CREATE TABLE... USING ... OPTIONS (...)`", it will show the DDL in this syntax. 
    
    -- For tables that are created by dataframe API, such as "`df.write.partitionBy(...).saveAsTable(...)`", currently the command will display DDL with the syntax "CREATE TABLE.. USING...OPTIONS(...)". However, this syntax lose the partitioning information. It is proposed to display create table in the dataframe API format.
    
    ## How was this patch tested?
    Unit tests are created. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xwu0226/spark show_create_table_3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12579.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12579
    
----
commit 0ebb0142e13db3ce8fb474ee5682528b0f87d2d2
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-02T01:46:16Z

    show create table DDL -- hive metastore table

commit 6d060be797d4127f0b86fa59c1bc848d75215533
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-02T06:01:46Z

    update upon review

commit 2799672162d715b209cad9a5c103d6f09692d8dc
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-02T18:19:26Z

    ignoring sqlContext temp table and considering datasource table ddl

commit 98c020aa9a5374861d1470fa0c305148e8314ada
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-04T21:54:32Z

    fix scala style issue

commit efd889821bf84e328ef6dd8d0b6a645729248251
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-04T22:40:26Z

    fix scala style issue in testcase

commit b370630f5827071bc5076e9b3fa9c92720b27eb2
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-05T01:31:46Z

    fix testcase for test failure

commit 8cb7a7299df84f2608b91b092a7df6795b85d41e
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-06T18:12:07Z

    continue the database ddl generation

commit 8b67d22c5ed8fd6b309df772e4a372e741acf630
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-08T20:57:12Z

    support datasource ddl

commit 9ab863fb7f8127d1acd083b1ba857f5c1fd2769c
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-08T22:04:05Z

    scala style fix

commit a40273c7989bebdf62b93ce6e604bb14cacce100
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-13T22:54:16Z

    merge the code committed by CREATE TABLE native support

commit d214a3b0c54641a6234ba39eef82b2b8ac4c87dd
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-14T23:49:03Z

    rework show create ddl based on new native supported create table DDL work

commit 1680ea0403f0d29185d9a3f8f81d15599be81aac
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-14T23:51:03Z

    Merge branch 'show_create_table_1' into show_create_table_2

commit fa8373c3fd2d27cf2b3356ee0214c8e04dfc0f36
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-15T02:03:41Z

    remove spaces

commit 5095b6c871de55e871c5ea606ade6ab0b2166627
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-15T16:24:53Z

    update upon review - use visitTableIdentifier

commit 15f226c7d4f195947cbb1acc341eaaae4072d4a6
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-20T18:28:29Z

    generate dataframe API create table for some datasource tables

commit 601867ae71cc370770deddd56cc8883b04dcf8ee
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-20T18:31:27Z

    synch up with master branch

commit 687f7aca56cf5c032ceac09c341b2dfd00129b8e
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-20T21:54:51Z

    update upon review

commit bf3512ba01e773a514350030cfa91087de10fc03
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-20T22:07:55Z

    synch up with latest change

commit ca44d67584f358bd588743d33de2b7d689df584d
Author: xin Wu <xi...@us.ibm.com>
Date:   2016-04-21T05:35:04Z

    synch up again

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-216763938
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-214553367
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-214472079
  
    @liancheng Thanks for triggering the test! I am looking into the test failure. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-214466178
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-214419087
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-213126393
  
    @yhuai @andrewor14 Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-215303758
  
    @yhuai @liancheng , I see PR [#12734](https://github.com/apache/spark/pull/12734) takes care of the PARTITIONED BY and CLUSTERED BY (with SORTED BY) clause for CTAS syntax, but not for non-CTAS syntax.  Now I need to change my PR to adapt to this change, which means that the generated DDL will be something like `create table t1 (c1 int, ...) using .. options (..) partitioned by (..) clustered by (...) sorted by (...) in ... buckets`. But there won't be a "select clause" following it since we do not have the original query. But such generated query will not run because [#12734](https://github.com/apache/spark/pull/12734) does not support it.  Can we add a fake select clause with a warning message?
    
    Also DataFrameWriter.saveAsTable case is like CTAS. Can we then generate the DDL as a regular CTAS syntax? This will change my current implementation in this PR. 
    Please advice, thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-214465973
  
    **[Test build #56899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56899/consoleFull)** for PR 12579 at commit [`13e9775`](https://github.com/apache/spark/commit/13e9775604f3365683bf2b0f3b35b80a30f05dd4).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 closed the pull request at:

    https://github.com/apache/spark/pull/12579


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-213032974
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-214420652
  
    **[Test build #56899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56899/consoleFull)** for PR 12579 at commit [`13e9775`](https://github.com/apache/spark/commit/13e9775604f3365683bf2b0f3b35b80a30f05dd4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-218238618
  
    @srowen Yes, for datasource table. This PR also includes the work for hive syntax DDL too. I see #12781 mentions that there will be followup PR taking care of the hive syntax DDL. So I wondering whether I should continue on this PR. I can close this one if there is no need. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-218549665
  
    @liancheng Thank you for the detail explanation!! Yeah. if the goal is to make sure Spark SQL can handle the generated DDL, then, we need to miss some hive features for now. I will close this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-218192270
  
    @xwu0226 I think this is superseded by https://github.com/apache/spark/pull/12781 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-218379687
  
    Hey @xwu0226, sorry that I didn't explain why I opened another PR for the same issue, was in code rush for 2.0...
    
    So one of the considerations for all the native DDL commands is that we don't want these DDL commands to rely on Hive anymore. This is because we'd like to remove Hive dependency from Spark SQL core and gradually make Hive a separate data source in the future. This means, we shouldn't add new code in places like `HiveClientImpl`. These new DDL command should be implemented upon interfaces like `CatalogTable`.
    
    One apparent problem of this approach is that, current Spark SQL interfaces don't capture all semantics of Hive. For example, some table metadata like skew spec is not covered in `CatalogTable` yet. Our general strategies are:
    
    1. For easy ones, like "owner" and "compressed" in #12844, we may just add them to the interface and leverage them.
    2. For features that are not supported in Spark SQL, for example, skew spec, we can simply ignore them for now, since Spark can't handle them anyway.
    
    There will be a follow-up of #12781 to add support for Hive tables. After offline discussion with @yhuai, we decided to add a flag in `CatalogTable` to indicate that whether there unrecognized metadata provided by the underlying external catalog, but not translated and included in `CatalogTable`. In this way, when applying `SHOW CREATE TABLE` to tables containing such metadata, this flag can be set to true, and we can simply refuse to output anything by checking this flag. This makes sense because even if you add things like skew spec in the result of `SHOW CREATE TABLE`, Spark can't handle the generated DDL statement


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14346][SQL] Show Create Table (Native)

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12579#issuecomment-214466180
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56899/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org