You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lantao Jin (Jira)" <ji...@apache.org> on 2019/11/11 01:53:00 UTC

[jira] [Updated] (SPARK-29421) Add an opportunity to change the file format of command CREATE TABLE LIKE

     [ https://issues.apache.org/jira/browse/SPARK-29421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lantao Jin updated SPARK-29421:
-------------------------------
    Description: 
Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on the definition of table tb2. The most user case is to create tb1 with the same schema of tb2. But an inconvenient case here is this command also copies the FileFormat from tb2, it cannot change the input/output format and serde. Add the ability of changing file format is useful for some scenarios like upgrading a table from a low performance file format to a high performance one (parquet, orc).

Hive support STORED AS new file format syntax:
{code}
CREATE TABLE tbl(a int) STORED AS TEXTFILE;
CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET;
{code}
We add a similar syntax for Spark. Here we separate to two features:
1. specify a different table provider in CREATE TABLE LIKE
2. Hive compatibility

In this PR, we address the first one:
Using `USING provider` to specify a different table provider in CREATE TABLE LIKE.

  was:
Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on the definition of table tb2. The most user case is to create tb1 with the same schema of tb2. But an inconvenient case here is this command also copies the FileFormat from tb2, it cannot change the input/output format and serde. Add the ability of changing file format is useful for some scenarios like upgrading a table from a low performance file format to a high performance one (parquet, orc).

Here gives two options to enhance it.
Option1: Add a configuration {{spark.sql.createTableLike.fileformat}}, the value by default is "none" which keeps the behaviour same with current -- copying the file format from source table. After run command SET spark.sql.createTableLike.fileformat=parquet or any other valid file format defined in {{HiveSerDe}}, {{CREATE TABLE ... LIKE}} will use the new file format type.

Option2: Add syntax {{USING fileformat}} after {{CREATE TABLE ... LIKE}}. For example,
{code}
CREATE TABLE tb1 LIKE tb2 USING parquet;
{code}
If USING keyword is ignored, it also keeps the behaviour same with current -- copying the file format from source table.

Both of them can keep its behaviour same with current.
We use option1 with parquet file format as an enhancement in our production thriftserver because we need change many existing SQL scripts without any modification. But for community, Option2 could be treated as a new feature since it needs user to write additional USING part.

cc [~dongjoon] [~hyukjin.kwon] [~joshrosen] [~cloud_fan] [~yumwang]


> Add an opportunity to change the file format of command CREATE TABLE LIKE
> -------------------------------------------------------------------------
>
>                 Key: SPARK-29421
>                 URL: https://issues.apache.org/jira/browse/SPARK-29421
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Lantao Jin
>            Priority: Major
>
> Use CREATE TABLE tb1 LIKE tb2 command to create an empty table tb1 based on the definition of table tb2. The most user case is to create tb1 with the same schema of tb2. But an inconvenient case here is this command also copies the FileFormat from tb2, it cannot change the input/output format and serde. Add the ability of changing file format is useful for some scenarios like upgrading a table from a low performance file format to a high performance one (parquet, orc).
> Hive support STORED AS new file format syntax:
> {code}
> CREATE TABLE tbl(a int) STORED AS TEXTFILE;
> CREATE TABLE tbl2 LIKE tbl STORED AS PARQUET;
> {code}
> We add a similar syntax for Spark. Here we separate to two features:
> 1. specify a different table provider in CREATE TABLE LIKE
> 2. Hive compatibility
> In this PR, we address the first one:
> Using `USING provider` to specify a different table provider in CREATE TABLE LIKE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org