You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2020/03/13 01:35:00 UTC

[jira] [Comment Edited] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

    [ https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058333#comment-17058333 ] 

Jungtaek Lim edited comment on SPARK-31136 at 3/13/20, 1:34 AM:
----------------------------------------------------------------

This reminds me about my previous PR:

[https://github.com/apache/spark/pull/27107]

Please go through the comments in the PR again. I'm quoting the key point here:
{quote}The parts differentiating between two syntaxes are skewSpec, rowFormat, and createFileFormat (using any of them would make create statement go into 2nd syntax), and all of them are optional. We're not enforcing to specify it but rely on the parser.
{quote}
I think the parser implementation around CREATE TABLE brings ambiguity which is not documented anywhere. It wasn't ambiguous because we forced to specify STORED AS if it's not a Hive table. Now it's either default provider or Hive according to which options are provided, which seems to be non-trivial to reason about. (End users would never know, as it's completely from parser rule.)

I feel this as the issue of "not breaking old behavior". The parser rule gets pretty much complicated due to support legacy config. Not breaking anything would make us be stuck eventually.


was (Author: kabhwan):
This reminds me about my previous PR:

[https://github.com/apache/spark/pull/27107]

Please go through the comments in the PR again. I'm quoting the key point here:
{quote}The parts differentiating between two syntaxes are skewSpec, rowFormat, and createFileFormat (using any of them would make create statement go into 2nd syntax), and all of them are optional. We're not enforcing to specify it but rely on the parser.
{quote}
I think the parser implementation around CREATE TABLE brings ambiguity which is not documented anywhere. It wasn't ambiguous because we forced to specify STORED AS if it's not a Hive table. Now it's either default provider or Hive according to which options are provided, which seems to be non-trivial to reason about.

I feel this as the issue of "not breaking old behavior". The parser rule gets pretty much complicated due to support legacy config. Not breaking anything would make us be stuck eventually.

> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-31136
>                 URL: https://issues.apache.org/jira/browse/SPARK-31136
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.061 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Time taken: 0.383 seconds
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.969 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: `default`.`t`;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org