You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2016/04/08 18:18:25 UTC

[jira] [Comment Edited] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

    [ https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232414#comment-15232414 ] 

Cheng Lian edited comment on SPARK-14488 at 4/8/16 4:17 PM:
------------------------------------------------------------

Discussed with [~yhuai] offline, and here's the summary:

{{CreateTempTableUsingAsSelect}} existed since 1.3 (I'm surprised that I never noticed it!). Its semantics is:

# Execute the {{SELECT}} query.
# Store query result to a user specified position in filesystem. Note that this means the {{PATH}} data source option should always be set when using this DDL command.
# Create a temporary table using written files.

Basically, it can be used to dump query results to the filesystem without creating persisted tables. It's indeed a confusing command and is kinda equivalent to the following DDL sequence:

- {{INSERT OVERWRITE DIRECTORY ... STORE AS ... SELECT ...}}
- {{CREATE TEMPORARY TABLE ... USING ... OPTION (PATH ...)}}

However, Spark hasn't implemented {{INSERT OVERWRITE DIRECTORY}} yet. In the long run, we should implement it and deprecate this confusing DDL command.

Ticket title and description were updated accordingly.


was (Author: lian cheng):
Discussed with [~yhuai] offline, and here's the summary:

{{CreateTempTableUsingAsSelect}} existed since 1.3 (I'm surprised that I never noticed it!). Its semantics is:

# Execute the {{SELECT}} query.
# Store query result to a user specified position in filesystem. Note that this means the {{PATH}} data source option should always be set when using this DDL command.
# Create a temporary table using written files.

Basically, it can be used to dump query results to the filesystem without creating persisted tables. It's indeed a confusing  and is kinda equivalent to the following DDL sequence:

- {{INSERT OVERWRITE DIRECTORY ... STORE AS ... SELECT ...}}
- {{CREATE TEMPORARY TABLE ... USING ... OPTION (PATH ...)}}

However, Spark hasn't implemented {{INSERT OVERWRITE DIRECTORY}} yet. In the long run, we should implement it and deprecate this confusing DDL command.

Ticket title and description were updated accordingly.

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-14488
>                 URL: https://issues.apache.org/jira/browse/SPARK-14488
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +---------+-----------+
> |tableName|isTemporary|
> +---------+-----------+
> |        y|      false|
> |        x|       true|
> +---------+-----------+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY TABLE ...}}.
> Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} rather than {{CreateTempTableUsingAsSelect}}.
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, None, Overwrite, Map()
> +- 'Project [*]
>    +- 'UnresolvedRelation `x`, None
> == Analyzed Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, None, Overwrite, Map()
> +- Project [id#0L]
>    +- SubqueryAlias x
>       +- Range 0, 10, 1, 1, [id#0L]
> == Optimized Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, None, Overwrite, Map()
> +- Range 0, 10, 1, 1, [id#0L]
> == Physical Plan ==
> ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, [Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, [id#0L]|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org