You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Rui Wang <ru...@google.com> on 2018/08/13 21:12:25 UTC

[Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Hi Community,

BeamSQL allows CREATE TABLE
<https://beam.apache.org/documentation/dsls/sql/create-table/> statements
to register virtual tables from external storage systems (e.g. BigQuery).

BeamSQL is not a storage system, so any table registered by "CREATE TABLE"
statement is essentially equivalent to be registered by "CREATE EXTERNAL
TABLE", which requires the user to provide a LOCATION and BeamSQL will
register the table outside of current execution environment based on
LOCATION.

So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL to help
users understand they are registering tables, and BeamSQL does not create
non existing tables by running CREATE TABLE (at least on some storage
systems, if not all).

We can make the EXTERNAL keyword either required or optional.

If we make the EXTERNAL keyword required:

Pros:
a. We can get rid of the registering table semantic on CREATE TABLE.
b, We keep the room that we could add CREATE TABLE back in the future if we
want CREATE TABLE to create, rather than not only register tables in
BeamSQL.

Cons:
1. CREATE TABLE syntax will not be supported so existing BeamSQL pipelines
which has CREATE TABLE require changes.
2. It's required to type tedious EXTERNAL keyword every time, especially in
SQL Shell.

If we make the EXTERNAL keyword optional, we will have reversed pros and
cons above.

Any thoughts on adding EXTERNAL keyword, and make it required or optional?


Thanks,
Rui

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Rui Wang <ru...@google.com>.
The change of documentation on Beam website is done (see:
create-external-table
<https://beam.apache.org/documentation/dsls/sql/create-external-table/>).
This proposal is finally transformed into master branch. Thanks all.

-Rui

On Fri, Sep 14, 2018 at 12:29 PM Anton Kedin <ke...@google.com> wrote:

> Raising this topic once more. The PR[1] has been open for a while, if
> there is no further input, I'm going to merge it by end of day.
>
> [1]: https://github.com/apache/beam/pull/6252
>
> Thank you,
> Anton
>
>
> On Wed, Aug 15, 2018 at 10:48 PM Tim <ti...@gmail.com> wrote:
>
>> +1 for CREATE EXTERNAL TABLE with similar reasoning given by others on
>> this thread.
>>
>> Tim
>>
>> On 15 Aug 2018, at 23:01, Charles Chen <cc...@google.com> wrote:
>>
>> +1 for CREATE EXTERNAL TABLE.  It is a good balance between the general
>> SQL expectation of having tables as an abstraction and reinforcing that
>> Beam does not store your data.
>>
>> On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <ru...@google.com> wrote:
>>
>>> >  I think users will be more confused to find that 'CREATE TABLE'
>>> doesn't exist then to learn that it might not always create a table.
>>>
>>> >> I think that having CREATE TABLE do something unexpected or not do
>>> something expected (or do the opposite things depending on the table type
>>> or some flag) is worse than having users look up the correct way of
>>> creating a data source in Beam SQL without expecting something we don't
>>> promise.
>>>
>>> I agree on this. Enforcing users to look up documentation for the
>>> correct way is better than letting them use an ambiguous way that could
>>> fail their expectation.
>>>
>>>
>>> -Rui
>>>
>>> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <ke...@google.com> wrote:
>>>
>>>> I think that something unique along the lines of `REGISTER EXTERNAL
>>>> DATA SOURCE` is probably fine, as it doesn't conflict with existing
>>>> behaviors of other dialects.
>>>>
>>>> > There is a lot of value in making sure our common operations closely
>>>> map to the equivalent common operations in other SQL dialects.
>>>>
>>>> We're trying to make opposite points using the same arguments :) A lot
>>>> of popular dialects make difference between CREATE TABLE and CREATE
>>>> EXTERNAL TABLE (or similar):
>>>>  - T-SQL:
>>>>       create:
>>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
>>>>       create external:
>>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
>>>>       external datasource:
>>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
>>>>  - PL/SQL:
>>>>       create:
>>>> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
>>>>       create external:
>>>> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
>>>>  - postgres:
>>>>       import foreign schema:
>>>> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
>>>>       create table:
>>>> https://www.postgresql.org/docs/9.1/static/sql-createtable.html
>>>>  - redshift:
>>>>       create external schema:
>>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
>>>>       create table:
>>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
>>>>  - hive internal and external:
>>>> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
>>>>
>>>> My understanding is that the behavior of create table is somewhat
>>>> similar in all of the above dialects, from the high-level perspective it
>>>> usually creates a persistent table in the current storage context
>>>> (database). That's not what Beam SQL's create table does right now, and my
>>>> opinion is that it should not be called create table for this reason.
>>>>
>>>> >  I think users will be more confused to find that 'CREATE TABLE'
>>>> doesn't exist then to learn that it might not always create a table.
>>>>
>>>> I think that having CREATE TABLE do something unexpected or not do
>>>> something expected (or do the opposite things depending on the table type
>>>> or some flag) is worse than having users look up the correct way of
>>>> creating a data source in Beam SQL without expecting something we don't
>>>> promise.
>>>>
>>>> >  (For example, a user guessing at the syntax of CREATE TABLE would
>>>> have a better experience with the error being "field LOCATION not
>>>> specified" rather than "operation CREATE TABLE not found".)
>>>>
>>>> They have to look it up anyway (what format is location for a Pubsub
>>>> topic? or is it a subscription?), and when doing so I think it would be
>>>> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL
>>>> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE
>>>> TABLE`.
>>>>
>>>> External tables and schemas don't have a standard approach and I don't
>>>> have a strong preference between any one from the above.
>>>>
>>>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> Adding dev@ back now.
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Did we drop the dev list from this on purpose? (I haven't added it
>>>>>> back, but we probably should.)
>>>>>>
>>>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
>>>>>> SCHEMA' if there is only to be one option. Sticking with those names
>>>>>> minimizes both our deviation from other implementations and user surprise.
>>>>>> There is a lot of value in making sure our common operations closely map to
>>>>>> the equivalent common operations in other SQL dialects. I think users will
>>>>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
>>>>>> that it might not always create a table. This minimizes the overhead of
>>>>>> learning our dialect of SQL and maximizes the odds that a user will be able
>>>>>> to guess at the syntax of something and have it work. (For example, a user
>>>>>> guessing at the syntax of CREATE TABLE would have a better experience with
>>>>>> the error being "field LOCATION not specified" rather than "operation
>>>>>> CREATE TABLE not found".)
>>>>>>
>>>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
>>>>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names
>>>>>> remove the ambiguity around the operation creating and the data source
>>>>>> being a table.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <ke...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> My preference is to make `EXTERNAL` mandatory and only support
>>>>>>> `CREATE EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>>>>>  - user friendliness, matching expectations, readability. Current
>>>>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to
>>>>>>> users familiar with SQL who expect that `CREATE TABLE` will actually create
>>>>>>> a table;
>>>>>>>  - forward-compatibility. We could potentially support non-external
>>>>>>> `CREATE TABLE` at some point in the future, whatever semantics it might
>>>>>>> have. It will be wrong to use the same syntax for external and non-external
>>>>>>> CREATEs;
>>>>>>>
>>>>>>> I agree that typing extra word each time is not ideal, but my
>>>>>>> opinion is on the side that readability of code (including SQL) is
>>>>>>> important (how much time you spend reading / understanding code vs writing
>>>>>>> it) and we should try to improve it if we can. In case of DDL every
>>>>>>> non-trivial statement will already have a ton of unavoidable words (field
>>>>>>> names, types, location, options) so I would argue that adding extra one
>>>>>>> word would not noticeably reduce your happiness of writing it :) But it
>>>>>>> would improve readability and reduce ambiguity, which I think is worth it.
>>>>>>>
>>>>>>> I think that making it optional only introduces more confusion (e.g.
>>>>>>> what's the difference between the two DDL statements without reading the
>>>>>>> doc?) and would make situation worse.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Anton
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <mi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>>>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>>>>>>> `EXTERNAL` as optional?
>>>>>>>>
>>>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <
>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>
>>>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer
>>>>>>>>> from a documentation prospective, but I'd be really unhappy if I had to
>>>>>>>>> type out 'EXTERNAL' every time. (I have the same concern with 'CREATE
>>>>>>>>> EXTERNAL SCHEMA'.)
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <ru...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi guys,
>>>>>>>>>>
>>>>>>>>>> I know you are probably using CREATE TABLE, Can I know your
>>>>>>>>>> thoughts on this?
>>>>>>>>>>
>>>>>>>>>> -Rui
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <ru...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be
>>>>>>>>>>> better.
>>>>>>>>>>>
>>>>>>>>>>> "create external" is being widely used by different systems with
>>>>>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>>>>>
>>>>>>>>>>> -Rui
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <
>>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate
>>>>>>>>>>>> that present, when I was triaging post-commit tests.
>>>>>>>>>>>>
>>>>>>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be a new
>>>>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but adding
>>>>>>>>>>>> something more explicit for the task might be even better.
>>>>>>>>>>>>
>>>>>>>>>>>> --Mikhail
>>>>>>>>>>>>
>>>>>>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>>>>>>> rfernand@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We
>>>>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE (a-la
>>>>>>>>>>>>> T-SQL
>>>>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or restrict
>>>>>>>>>>>>> DML statements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Community,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>>>>>>> statements to register virtual tables from external storage systems (e.g.
>>>>>>>>>>>>>> BigQuery).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be registered by
>>>>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a LOCATION and
>>>>>>>>>>>>>> BeamSQL will register the table outside of current execution environment
>>>>>>>>>>>>>> based on LOCATION.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in
>>>>>>>>>>>>>> BeamSQL to help users understand they are registering tables, and BeamSQL
>>>>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at least on
>>>>>>>>>>>>>> some storage systems, if not all).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pros:
>>>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>>>>>>> TABLE.
>>>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in
>>>>>>>>>>>>>> the future if we want CREATE TABLE to create, rather than not only register
>>>>>>>>>>>>>> tables in BeamSQL.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cons:
>>>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing
>>>>>>>>>>>>>> BeamSQL pipelines which has CREATE TABLE require changes.
>>>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have
>>>>>>>>>>>>>> reversed pros and cons above.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required
>>>>>>>>>>>>>> or optional?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Rui
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ----
>>>>>>>> Mingmin
>>>>>>>>
>>>>>>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Anton Kedin <ke...@google.com>.
Raising this topic once more. The PR[1] has been open for a while, if there
is no further input, I'm going to merge it by end of day.

[1]: https://github.com/apache/beam/pull/6252

Thank you,
Anton


On Wed, Aug 15, 2018 at 10:48 PM Tim <ti...@gmail.com> wrote:

> +1 for CREATE EXTERNAL TABLE with similar reasoning given by others on
> this thread.
>
> Tim
>
> On 15 Aug 2018, at 23:01, Charles Chen <cc...@google.com> wrote:
>
> +1 for CREATE EXTERNAL TABLE.  It is a good balance between the general
> SQL expectation of having tables as an abstraction and reinforcing that
> Beam does not store your data.
>
> On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <ru...@google.com> wrote:
>
>> >  I think users will be more confused to find that 'CREATE TABLE'
>> doesn't exist then to learn that it might not always create a table.
>>
>> >> I think that having CREATE TABLE do something unexpected or not do
>> something expected (or do the opposite things depending on the table type
>> or some flag) is worse than having users look up the correct way of
>> creating a data source in Beam SQL without expecting something we don't
>> promise.
>>
>> I agree on this. Enforcing users to look up documentation for the correct
>> way is better than letting them use an ambiguous way that could fail their
>> expectation.
>>
>>
>> -Rui
>>
>> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <ke...@google.com> wrote:
>>
>>> I think that something unique along the lines of `REGISTER EXTERNAL DATA
>>> SOURCE` is probably fine, as it doesn't conflict with existing behaviors of
>>> other dialects.
>>>
>>> > There is a lot of value in making sure our common operations closely
>>> map to the equivalent common operations in other SQL dialects.
>>>
>>> We're trying to make opposite points using the same arguments :) A lot
>>> of popular dialects make difference between CREATE TABLE and CREATE
>>> EXTERNAL TABLE (or similar):
>>>  - T-SQL:
>>>       create:
>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
>>>       create external:
>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
>>>       external datasource:
>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
>>>  - PL/SQL:
>>>       create:
>>> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
>>>       create external:
>>> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
>>>  - postgres:
>>>       import foreign schema:
>>> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
>>>       create table:
>>> https://www.postgresql.org/docs/9.1/static/sql-createtable.html
>>>  - redshift:
>>>       create external schema:
>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
>>>       create table:
>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
>>>  - hive internal and external:
>>> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
>>>
>>> My understanding is that the behavior of create table is somewhat
>>> similar in all of the above dialects, from the high-level perspective it
>>> usually creates a persistent table in the current storage context
>>> (database). That's not what Beam SQL's create table does right now, and my
>>> opinion is that it should not be called create table for this reason.
>>>
>>> >  I think users will be more confused to find that 'CREATE TABLE'
>>> doesn't exist then to learn that it might not always create a table.
>>>
>>> I think that having CREATE TABLE do something unexpected or not do
>>> something expected (or do the opposite things depending on the table type
>>> or some flag) is worse than having users look up the correct way of
>>> creating a data source in Beam SQL without expecting something we don't
>>> promise.
>>>
>>> >  (For example, a user guessing at the syntax of CREATE TABLE would
>>> have a better experience with the error being "field LOCATION not
>>> specified" rather than "operation CREATE TABLE not found".)
>>>
>>> They have to look it up anyway (what format is location for a Pubsub
>>> topic? or is it a subscription?), and when doing so I think it would be
>>> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL
>>> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE
>>> TABLE`.
>>>
>>> External tables and schemas don't have a standard approach and I don't
>>> have a strong preference between any one from the above.
>>>
>>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <ru...@google.com> wrote:
>>>
>>>> Adding dev@ back now.
>>>>
>>>> -Rui
>>>>
>>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> Did we drop the dev list from this on purpose? (I haven't added it
>>>>> back, but we probably should.)
>>>>>
>>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
>>>>> SCHEMA' if there is only to be one option. Sticking with those names
>>>>> minimizes both our deviation from other implementations and user surprise.
>>>>> There is a lot of value in making sure our common operations closely map to
>>>>> the equivalent common operations in other SQL dialects. I think users will
>>>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
>>>>> that it might not always create a table. This minimizes the overhead of
>>>>> learning our dialect of SQL and maximizes the odds that a user will be able
>>>>> to guess at the syntax of something and have it work. (For example, a user
>>>>> guessing at the syntax of CREATE TABLE would have a better experience with
>>>>> the error being "field LOCATION not specified" rather than "operation
>>>>> CREATE TABLE not found".)
>>>>>
>>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
>>>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names
>>>>> remove the ambiguity around the operation creating and the data source
>>>>> being a table.
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <ke...@google.com> wrote:
>>>>>
>>>>>> My preference is to make `EXTERNAL` mandatory and only support
>>>>>> `CREATE EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>>>>  - user friendliness, matching expectations, readability. Current
>>>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to
>>>>>> users familiar with SQL who expect that `CREATE TABLE` will actually create
>>>>>> a table;
>>>>>>  - forward-compatibility. We could potentially support non-external
>>>>>> `CREATE TABLE` at some point in the future, whatever semantics it might
>>>>>> have. It will be wrong to use the same syntax for external and non-external
>>>>>> CREATEs;
>>>>>>
>>>>>> I agree that typing extra word each time is not ideal, but my opinion
>>>>>> is on the side that readability of code (including SQL) is important (how
>>>>>> much time you spend reading / understanding code vs writing it) and we
>>>>>> should try to improve it if we can. In case of DDL every non-trivial
>>>>>> statement will already have a ton of unavoidable words (field names, types,
>>>>>> location, options) so I would argue that adding extra one word would not
>>>>>> noticeably reduce your happiness of writing it :) But it would improve
>>>>>> readability and reduce ambiguity, which I think is worth it.
>>>>>>
>>>>>> I think that making it optional only introduces more confusion (e.g.
>>>>>> what's the difference between the two DDL statements without reading the
>>>>>> doc?) and would make situation worse.
>>>>>>
>>>>>> Regards,
>>>>>> Anton
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <mi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>>>>>> `EXTERNAL` as optional?
>>>>>>>
>>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <
>>>>>>> apilloud@google.com> wrote:
>>>>>>>
>>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer
>>>>>>>> from a documentation prospective, but I'd be really unhappy if I had to
>>>>>>>> type out 'EXTERNAL' every time. (I have the same concern with 'CREATE
>>>>>>>> EXTERNAL SCHEMA'.)
>>>>>>>>
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <ru...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi guys,
>>>>>>>>>
>>>>>>>>> I know you are probably using CREATE TABLE, Can I know your
>>>>>>>>> thoughts on this?
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <ru...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be
>>>>>>>>>> better.
>>>>>>>>>>
>>>>>>>>>> "create external" is being widely used by different systems with
>>>>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>>>>
>>>>>>>>>> -Rui
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <
>>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that
>>>>>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>>>>>
>>>>>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be a new
>>>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but adding
>>>>>>>>>>> something more explicit for the task might be even better.
>>>>>>>>>>>
>>>>>>>>>>> --Mikhail
>>>>>>>>>>>
>>>>>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>>>>>> rfernand@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We
>>>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE (a-la
>>>>>>>>>>>> T-SQL
>>>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or restrict
>>>>>>>>>>>> DML statements.
>>>>>>>>>>>>
>>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Community,
>>>>>>>>>>>>>
>>>>>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>>>>>> statements to register virtual tables from external storage systems (e.g.
>>>>>>>>>>>>> BigQuery).
>>>>>>>>>>>>>
>>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be registered by
>>>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a LOCATION and
>>>>>>>>>>>>> BeamSQL will register the table outside of current execution environment
>>>>>>>>>>>>> based on LOCATION.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in
>>>>>>>>>>>>> BeamSQL to help users understand they are registering tables, and BeamSQL
>>>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at least on
>>>>>>>>>>>>> some storage systems, if not all).
>>>>>>>>>>>>>
>>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pros:
>>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>>>>>> TABLE.
>>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the
>>>>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only register
>>>>>>>>>>>>> tables in BeamSQL.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cons:
>>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing
>>>>>>>>>>>>> BeamSQL pipelines which has CREATE TABLE require changes.
>>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have
>>>>>>>>>>>>> reversed pros and cons above.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required
>>>>>>>>>>>>> or optional?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Rui
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ----
>>>>>>> Mingmin
>>>>>>>
>>>>>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Tim <ti...@gmail.com>.
+1 for CREATE EXTERNAL TABLE with similar reasoning given by others on this thread.

Tim

> On 15 Aug 2018, at 23:01, Charles Chen <cc...@google.com> wrote:
> 
> +1 for CREATE EXTERNAL TABLE.  It is a good balance between the general SQL expectation of having tables as an abstraction and reinforcing that Beam does not store your data.
> 
>> On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <ru...@google.com> wrote:
>> >  I think users will be more confused to find that 'CREATE TABLE' doesn't exist then to learn that it might not always create a table. 
>> 
>> >> I think that having CREATE TABLE do something unexpected or not do something expected (or do the opposite things depending on the table type or some flag) is worse than having users look up the correct way of creating a data source in Beam SQL without expecting something we don't promise.
>> 
>> I agree on this. Enforcing users to look up documentation for the correct way is better than letting them use an ambiguous way that could fail their expectation.
>> 
>> 
>> -Rui
>> 
>>> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <ke...@google.com> wrote:
>>> I think that something unique along the lines of `REGISTER EXTERNAL DATA SOURCE` is probably fine, as it doesn't conflict with existing behaviors of other dialects.
>>> 
>>> > There is a lot of value in making sure our common operations closely map to the equivalent common operations in other SQL dialects. 
>>> 
>>> We're trying to make opposite points using the same arguments :) A lot of popular dialects make difference between CREATE TABLE and CREATE EXTERNAL TABLE (or similar):
>>>  - T-SQL:
>>>       create: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
>>>       create external: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
>>>       external datasource: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
>>>  - PL/SQL:
>>>       create: https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
>>>       create external: https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
>>>  - postgres:
>>>       import foreign schema: https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
>>>       create table: https://www.postgresql.org/docs/9.1/static/sql-createtable.html
>>>  - redshift:
>>>       create external schema: https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
>>>       create table: https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
>>>  - hive internal and external: https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
>>> 
>>> My understanding is that the behavior of create table is somewhat similar in all of the above dialects, from the high-level perspective it usually creates a persistent table in the current storage context (database). That's not what Beam SQL's create table does right now, and my opinion is that it should not be called create table for this reason.
>>> 
>>> >  I think users will be more confused to find that 'CREATE TABLE' doesn't exist then to learn that it might not always create a table. 
>>> 
>>> I think that having CREATE TABLE do something unexpected or not do something expected (or do the opposite things depending on the table type or some flag) is worse than having users look up the correct way of creating a data source in Beam SQL without expecting something we don't promise.
>>> 
>>> >  (For example, a user guessing at the syntax of CREATE TABLE would have a better experience with the error being "field LOCATION not specified" rather than "operation CREATE TABLE not found".)
>>> 
>>> They have to look it up anyway (what format is location for a Pubsub topic? or is it a subscription?), and when doing so I think it would be less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE TABLE`.
>>> 
>>> External tables and schemas don't have a standard approach and I don't have a strong preference between any one from the above. 
>>> 
>>>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <ru...@google.com> wrote:
>>>> Adding dev@ back now.
>>>> 
>>>> -Rui
>>>> 
>>>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <ap...@google.com> wrote:
>>>>> Did we drop the dev list from this on purpose? (I haven't added it back, but we probably should.)
>>>>> 
>>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE SCHEMA' if there is only to be one option. Sticking with those names minimizes both our deviation from other implementations and user surprise. There is a lot of value in making sure our common operations closely map to the equivalent common operations in other SQL dialects. I think users will be more confused to find that 'CREATE TABLE' doesn't exist then to learn that it might not always create a table. This minimizes the overhead of learning our dialect of SQL and maximizes the odds that a user will be able to guess at the syntax of something and have it work. (For example, a user guessing at the syntax of CREATE TABLE would have a better experience with the error being "field LOCATION not specified" rather than "operation CREATE TABLE not found".)
>>>>> 
>>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names remove the ambiguity around the operation creating and the data source being a table.
>>>>> 
>>>>> Andrew
>>>>> 
>>>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <ke...@google.com> wrote:
>>>>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>>>>  - user friendliness, matching expectations, readability. Current `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to users familiar with SQL who expect that `CREATE TABLE` will actually create a table;
>>>>>>  - forward-compatibility. We could potentially support non-external `CREATE TABLE` at some point in the future, whatever semantics it might have. It will be wrong to use the same syntax for external and non-external CREATEs;
>>>>>> 
>>>>>> I agree that typing extra word each time is not ideal, but my opinion is on the side that readability of code (including SQL) is important (how much time you spend reading / understanding code vs writing it) and we should try to improve it if we can. In case of DDL every non-trivial statement will already have a ton of unavoidable words (field names, types, location, options) so I would argue that adding extra one word would not noticeably reduce your happiness of writing it :) But it would improve readability and reduce ambiguity, which I think is worth it.
>>>>>> 
>>>>>> I think that making it optional only introduces more confusion (e.g. what's the difference between the two DDL statements without reading the doc?) and would make situation worse.
>>>>>> 
>>>>>> Regards,
>>>>>> Anton
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <mi...@gmail.com> wrote:
>>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making `EXTERNAL` as optional?  
>>>>>>> 
>>>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <ap...@google.com> wrote:
>>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from a documentation prospective, but I'd be really unhappy if I had to type out 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL SCHEMA'.)
>>>>>>>> 
>>>>>>>> Andrew
>>>>>>>> 
>>>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>>> Hi guys,
>>>>>>>>> 
>>>>>>>>> I know you are probably using CREATE TABLE, Can I know your thoughts on this?
>>>>>>>>> 
>>>>>>>>> -Rui
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <ru...@google.com> wrote:
>>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be better.
>>>>>>>>>> 
>>>>>>>>>> "create external" is being widely used by different systems with similar meaning so "create" usually is ok to external data sources.
>>>>>>>>>> 
>>>>>>>>>> -Rui
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <mi...@google.com> wrote:
>>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that present, when I was triaging post-commit tests.
>>>>>>>>>>> 
>>>>>>>>>>> Do we have any terms that specify connection to external table? "CREATE" word triggers this reaction in my brain that there will be a new table created. Adding "EXTERNAL" would already add distinction, but adding something more explicit for the task might be even better.
>>>>>>>>>>> 
>>>>>>>>>>> --Mikhail
>>>>>>>>>>> 
>>>>>>>>>>> Have feedback? 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <rf...@google.com> wrote:
>>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We could also introduce something like CREATE EXTERNAL DATA SOURCE (a-la T-SQL), if it's somehow advantageous for us to leverage access patterns or restrict DML statements.
>>>>>>>>>>>> 
>>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>>>>>>> Hi Community,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BeamSQL allows CREATE TABLE statements to register virtual tables from external storage systems (e.g. BigQuery). 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by "CREATE TABLE" statement is essentially equivalent to be registered by "CREATE EXTERNAL TABLE", which requires the user to provide a LOCATION and BeamSQL will register the table outside of current execution environment based on LOCATION.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL to help users understand they are registering tables, and BeamSQL does not create non existing tables by running CREATE TABLE (at least on some storage systems, if not all). 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Pros:
>>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE TABLE. 
>>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the future if we want CREATE TABLE to create, rather than not only register tables in BeamSQL. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cons:
>>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL pipelines which has CREATE TABLE require changes.
>>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time, especially in SQL Shell.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed pros and cons above.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required or optional?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Rui
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> ----
>>>>>>> Mingmin

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Charles Chen <cc...@google.com>.
+1 for CREATE EXTERNAL TABLE.  It is a good balance between the general SQL
expectation of having tables as an abstraction and reinforcing that Beam
does not store your data.

On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <ru...@google.com> wrote:

> >  I think users will be more confused to find that 'CREATE TABLE' doesn't
> exist then to learn that it might not always create a table.
>
> >> I think that having CREATE TABLE do something unexpected or not do
> something expected (or do the opposite things depending on the table type
> or some flag) is worse than having users look up the correct way of
> creating a data source in Beam SQL without expecting something we don't
> promise.
>
> I agree on this. Enforcing users to look up documentation for the correct
> way is better than letting them use an ambiguous way that could fail their
> expectation.
>
>
> -Rui
>
> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <ke...@google.com> wrote:
>
>> I think that something unique along the lines of `REGISTER EXTERNAL DATA
>> SOURCE` is probably fine, as it doesn't conflict with existing behaviors of
>> other dialects.
>>
>> > There is a lot of value in making sure our common operations closely
>> map to the equivalent common operations in other SQL dialects.
>>
>> We're trying to make opposite points using the same arguments :) A lot of
>> popular dialects make difference between CREATE TABLE and CREATE EXTERNAL
>> TABLE (or similar):
>>  - T-SQL:
>>       create:
>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
>>       create external:
>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
>>       external datasource:
>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
>>  - PL/SQL:
>>       create:
>> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
>>       create external:
>> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
>>  - postgres:
>>       import foreign schema:
>> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
>>       create table:
>> https://www.postgresql.org/docs/9.1/static/sql-createtable.html
>>  - redshift:
>>       create external schema:
>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
>>       create table:
>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
>>  - hive internal and external:
>> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
>>
>> My understanding is that the behavior of create table is somewhat similar
>> in all of the above dialects, from the high-level perspective it usually
>> creates a persistent table in the current storage context (database).
>> That's not what Beam SQL's create table does right now, and my opinion is
>> that it should not be called create table for this reason.
>>
>> >  I think users will be more confused to find that 'CREATE TABLE'
>> doesn't exist then to learn that it might not always create a table.
>>
>> I think that having CREATE TABLE do something unexpected or not do
>> something expected (or do the opposite things depending on the table type
>> or some flag) is worse than having users look up the correct way of
>> creating a data source in Beam SQL without expecting something we don't
>> promise.
>>
>> >  (For example, a user guessing at the syntax of CREATE TABLE would have
>> a better experience with the error being "field LOCATION not specified"
>> rather than "operation CREATE TABLE not found".)
>>
>> They have to look it up anyway (what format is location for a Pubsub
>> topic? or is it a subscription?), and when doing so I think it would be
>> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL
>> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE
>> TABLE`.
>>
>> External tables and schemas don't have a standard approach and I don't
>> have a strong preference between any one from the above.
>>
>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <ru...@google.com> wrote:
>>
>>> Adding dev@ back now.
>>>
>>> -Rui
>>>
>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> Did we drop the dev list from this on purpose? (I haven't added it
>>>> back, but we probably should.)
>>>>
>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
>>>> SCHEMA' if there is only to be one option. Sticking with those names
>>>> minimizes both our deviation from other implementations and user surprise.
>>>> There is a lot of value in making sure our common operations closely map to
>>>> the equivalent common operations in other SQL dialects. I think users will
>>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
>>>> that it might not always create a table. This minimizes the overhead of
>>>> learning our dialect of SQL and maximizes the odds that a user will be able
>>>> to guess at the syntax of something and have it work. (For example, a user
>>>> guessing at the syntax of CREATE TABLE would have a better experience with
>>>> the error being "field LOCATION not specified" rather than "operation
>>>> CREATE TABLE not found".)
>>>>
>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
>>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names
>>>> remove the ambiguity around the operation creating and the data source
>>>> being a table.
>>>>
>>>> Andrew
>>>>
>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <ke...@google.com> wrote:
>>>>
>>>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE
>>>>> EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>>>  - user friendliness, matching expectations, readability. Current
>>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to
>>>>> users familiar with SQL who expect that `CREATE TABLE` will actually create
>>>>> a table;
>>>>>  - forward-compatibility. We could potentially support non-external
>>>>> `CREATE TABLE` at some point in the future, whatever semantics it might
>>>>> have. It will be wrong to use the same syntax for external and non-external
>>>>> CREATEs;
>>>>>
>>>>> I agree that typing extra word each time is not ideal, but my opinion
>>>>> is on the side that readability of code (including SQL) is important (how
>>>>> much time you spend reading / understanding code vs writing it) and we
>>>>> should try to improve it if we can. In case of DDL every non-trivial
>>>>> statement will already have a ton of unavoidable words (field names, types,
>>>>> location, options) so I would argue that adding extra one word would not
>>>>> noticeably reduce your happiness of writing it :) But it would improve
>>>>> readability and reduce ambiguity, which I think is worth it.
>>>>>
>>>>> I think that making it optional only introduces more confusion (e.g.
>>>>> what's the difference between the two DDL statements without reading the
>>>>> doc?) and would make situation worse.
>>>>>
>>>>> Regards,
>>>>> Anton
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <mi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>>>>> `EXTERNAL` as optional?
>>>>>>
>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <apilloud@google.com
>>>>>> > wrote:
>>>>>>
>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from
>>>>>>> a documentation prospective, but I'd be really unhappy if I had to type out
>>>>>>> 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL
>>>>>>> SCHEMA'.)
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> I know you are probably using CREATE TABLE, Can I know your
>>>>>>>> thoughts on this?
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <ru...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be
>>>>>>>>> better.
>>>>>>>>>
>>>>>>>>> "create external" is being widely used by different systems with
>>>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <
>>>>>>>>> migryz@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that
>>>>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>>>>
>>>>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be a new
>>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but adding
>>>>>>>>>> something more explicit for the task might be even better.
>>>>>>>>>>
>>>>>>>>>> --Mikhail
>>>>>>>>>>
>>>>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>>>>> rfernand@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We
>>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE (a-la
>>>>>>>>>>> T-SQL
>>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or restrict
>>>>>>>>>>> DML statements.
>>>>>>>>>>>
>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Community,
>>>>>>>>>>>>
>>>>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>>>>> statements to register virtual tables from external storage systems (e.g.
>>>>>>>>>>>> BigQuery).
>>>>>>>>>>>>
>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be registered by
>>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a LOCATION and
>>>>>>>>>>>> BeamSQL will register the table outside of current execution environment
>>>>>>>>>>>> based on LOCATION.
>>>>>>>>>>>>
>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in
>>>>>>>>>>>> BeamSQL to help users understand they are registering tables, and BeamSQL
>>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at least on
>>>>>>>>>>>> some storage systems, if not all).
>>>>>>>>>>>>
>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>>>
>>>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>>>
>>>>>>>>>>>> Pros:
>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>>>>> TABLE.
>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the
>>>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only register
>>>>>>>>>>>> tables in BeamSQL.
>>>>>>>>>>>>
>>>>>>>>>>>> Cons:
>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing
>>>>>>>>>>>> BeamSQL pipelines which has CREATE TABLE require changes.
>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>>>
>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed
>>>>>>>>>>>> pros and cons above.
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required
>>>>>>>>>>>> or optional?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Rui
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ----
>>>>>> Mingmin
>>>>>>
>>>>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Rui Wang <ru...@google.com>.
>  I think users will be more confused to find that 'CREATE TABLE' doesn't
exist then to learn that it might not always create a table.

>> I think that having CREATE TABLE do something unexpected or not do
something expected (or do the opposite things depending on the table type
or some flag) is worse than having users look up the correct way of
creating a data source in Beam SQL without expecting something we don't
promise.

I agree on this. Enforcing users to look up documentation for the correct
way is better than letting them use an ambiguous way that could fail their
expectation.


-Rui

On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <ke...@google.com> wrote:

> I think that something unique along the lines of `REGISTER EXTERNAL DATA
> SOURCE` is probably fine, as it doesn't conflict with existing behaviors of
> other dialects.
>
> > There is a lot of value in making sure our common operations closely map
> to the equivalent common operations in other SQL dialects.
>
> We're trying to make opposite points using the same arguments :) A lot of
> popular dialects make difference between CREATE TABLE and CREATE EXTERNAL
> TABLE (or similar):
>  - T-SQL:
>       create:
> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
>       create external:
> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
>       external datasource:
> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
>  - PL/SQL:
>       create:
> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
>       create external:
> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
>  - postgres:
>       import foreign schema:
> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
>       create table:
> https://www.postgresql.org/docs/9.1/static/sql-createtable.html
>  - redshift:
>       create external schema:
> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
>       create table:
> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
>  - hive internal and external:
> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
>
> My understanding is that the behavior of create table is somewhat similar
> in all of the above dialects, from the high-level perspective it usually
> creates a persistent table in the current storage context (database).
> That's not what Beam SQL's create table does right now, and my opinion is
> that it should not be called create table for this reason.
>
> >  I think users will be more confused to find that 'CREATE TABLE' doesn't
> exist then to learn that it might not always create a table.
>
> I think that having CREATE TABLE do something unexpected or not do
> something expected (or do the opposite things depending on the table type
> or some flag) is worse than having users look up the correct way of
> creating a data source in Beam SQL without expecting something we don't
> promise.
>
> >  (For example, a user guessing at the syntax of CREATE TABLE would have
> a better experience with the error being "field LOCATION not specified"
> rather than "operation CREATE TABLE not found".)
>
> They have to look it up anyway (what format is location for a Pubsub
> topic? or is it a subscription?), and when doing so I think it would be
> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL
> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE
> TABLE`.
>
> External tables and schemas don't have a standard approach and I don't
> have a strong preference between any one from the above.
>
> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <ru...@google.com> wrote:
>
>> Adding dev@ back now.
>>
>> -Rui
>>
>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> Did we drop the dev list from this on purpose? (I haven't added it back,
>>> but we probably should.)
>>>
>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
>>> SCHEMA' if there is only to be one option. Sticking with those names
>>> minimizes both our deviation from other implementations and user surprise.
>>> There is a lot of value in making sure our common operations closely map to
>>> the equivalent common operations in other SQL dialects. I think users will
>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
>>> that it might not always create a table. This minimizes the overhead of
>>> learning our dialect of SQL and maximizes the odds that a user will be able
>>> to guess at the syntax of something and have it work. (For example, a user
>>> guessing at the syntax of CREATE TABLE would have a better experience with
>>> the error being "field LOCATION not specified" rather than "operation
>>> CREATE TABLE not found".)
>>>
>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names
>>> remove the ambiguity around the operation creating and the data source
>>> being a table.
>>>
>>> Andrew
>>>
>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <ke...@google.com> wrote:
>>>
>>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE
>>>> EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>>  - user friendliness, matching expectations, readability. Current
>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to
>>>> users familiar with SQL who expect that `CREATE TABLE` will actually create
>>>> a table;
>>>>  - forward-compatibility. We could potentially support non-external
>>>> `CREATE TABLE` at some point in the future, whatever semantics it might
>>>> have. It will be wrong to use the same syntax for external and non-external
>>>> CREATEs;
>>>>
>>>> I agree that typing extra word each time is not ideal, but my opinion
>>>> is on the side that readability of code (including SQL) is important (how
>>>> much time you spend reading / understanding code vs writing it) and we
>>>> should try to improve it if we can. In case of DDL every non-trivial
>>>> statement will already have a ton of unavoidable words (field names, types,
>>>> location, options) so I would argue that adding extra one word would not
>>>> noticeably reduce your happiness of writing it :) But it would improve
>>>> readability and reduce ambiguity, which I think is worth it.
>>>>
>>>> I think that making it optional only introduces more confusion (e.g.
>>>> what's the difference between the two DDL statements without reading the
>>>> doc?) and would make situation worse.
>>>>
>>>> Regards,
>>>> Anton
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <mi...@gmail.com> wrote:
>>>>
>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>>>> `EXTERNAL` as optional?
>>>>>
>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from
>>>>>> a documentation prospective, but I'd be really unhappy if I had to type out
>>>>>> 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL
>>>>>> SCHEMA'.)
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <ru...@google.com> wrote:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I know you are probably using CREATE TABLE, Can I know your thoughts
>>>>>>> on this?
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <ru...@google.com> wrote:
>>>>>>>
>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be
>>>>>>>> better.
>>>>>>>>
>>>>>>>> "create external" is being widely used by different systems with
>>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <
>>>>>>>> migryz@google.com> wrote:
>>>>>>>>
>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that
>>>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>>>
>>>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be a new
>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but adding
>>>>>>>>> something more explicit for the task might be even better.
>>>>>>>>>
>>>>>>>>> --Mikhail
>>>>>>>>>
>>>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>>>> rfernand@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We
>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE (a-la
>>>>>>>>>> T-SQL
>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or restrict
>>>>>>>>>> DML statements.
>>>>>>>>>>
>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Community,
>>>>>>>>>>>
>>>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>>>> statements to register virtual tables from external storage systems (e.g.
>>>>>>>>>>> BigQuery).
>>>>>>>>>>>
>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be registered by
>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a LOCATION and
>>>>>>>>>>> BeamSQL will register the table outside of current execution environment
>>>>>>>>>>> based on LOCATION.
>>>>>>>>>>>
>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in
>>>>>>>>>>> BeamSQL to help users understand they are registering tables, and BeamSQL
>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at least on
>>>>>>>>>>> some storage systems, if not all).
>>>>>>>>>>>
>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>>
>>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>>
>>>>>>>>>>> Pros:
>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>>>> TABLE.
>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the
>>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only register
>>>>>>>>>>> tables in BeamSQL.
>>>>>>>>>>>
>>>>>>>>>>> Cons:
>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL
>>>>>>>>>>> pipelines which has CREATE TABLE require changes.
>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>>
>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed
>>>>>>>>>>> pros and cons above.
>>>>>>>>>>>
>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required or
>>>>>>>>>>> optional?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Rui
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ----
>>>>> Mingmin
>>>>>
>>>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Anton Kedin <ke...@google.com>.
I think that something unique along the lines of `REGISTER EXTERNAL DATA
SOURCE` is probably fine, as it doesn't conflict with existing behaviors of
other dialects.

> There is a lot of value in making sure our common operations closely map
to the equivalent common operations in other SQL dialects.

We're trying to make opposite points using the same arguments :) A lot of
popular dialects make difference between CREATE TABLE and CREATE EXTERNAL
TABLE (or similar):
 - T-SQL:
      create:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
      create external:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
      external datasource:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
 - PL/SQL:
      create:
https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
      create external:
https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
 - postgres:
      import foreign schema:
https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
      create table:
https://www.postgresql.org/docs/9.1/static/sql-createtable.html
 - redshift:
      create external schema:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
      create table:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
 - hive internal and external:
https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables

My understanding is that the behavior of create table is somewhat similar
in all of the above dialects, from the high-level perspective it usually
creates a persistent table in the current storage context (database).
That's not what Beam SQL's create table does right now, and my opinion is
that it should not be called create table for this reason.

>  I think users will be more confused to find that 'CREATE TABLE' doesn't
exist then to learn that it might not always create a table.

I think that having CREATE TABLE do something unexpected or not do
something expected (or do the opposite things depending on the table type
or some flag) is worse than having users look up the correct way of
creating a data source in Beam SQL without expecting something we don't
promise.

>  (For example, a user guessing at the syntax of CREATE TABLE would have a
better experience with the error being "field LOCATION not specified"
rather than "operation CREATE TABLE not found".)

They have to look it up anyway (what format is location for a Pubsub topic?
or is it a subscription?), and when doing so I think it would be less
confusing to read that to get data from Pubsub/Kafka/... in Beam SQL you
have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE
TABLE`.

External tables and schemas don't have a standard approach and I don't have
a strong preference between any one from the above.

On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <ru...@google.com> wrote:

> Adding dev@ back now.
>
> -Rui
>
> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> Did we drop the dev list from this on purpose? (I haven't added it back,
>> but we probably should.)
>>
>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
>> SCHEMA' if there is only to be one option. Sticking with those names
>> minimizes both our deviation from other implementations and user surprise.
>> There is a lot of value in making sure our common operations closely map to
>> the equivalent common operations in other SQL dialects. I think users will
>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
>> that it might not always create a table. This minimizes the overhead of
>> learning our dialect of SQL and maximizes the odds that a user will be able
>> to guess at the syntax of something and have it work. (For example, a user
>> guessing at the syntax of CREATE TABLE would have a better experience with
>> the error being "field LOCATION not specified" rather than "operation
>> CREATE TABLE not found".)
>>
>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names remove
>> the ambiguity around the operation creating and the data source being a
>> table.
>>
>> Andrew
>>
>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <ke...@google.com> wrote:
>>
>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE
>>> EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>  - user friendliness, matching expectations, readability. Current
>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to
>>> users familiar with SQL who expect that `CREATE TABLE` will actually create
>>> a table;
>>>  - forward-compatibility. We could potentially support non-external
>>> `CREATE TABLE` at some point in the future, whatever semantics it might
>>> have. It will be wrong to use the same syntax for external and non-external
>>> CREATEs;
>>>
>>> I agree that typing extra word each time is not ideal, but my opinion is
>>> on the side that readability of code (including SQL) is important (how much
>>> time you spend reading / understanding code vs writing it) and we should
>>> try to improve it if we can. In case of DDL every non-trivial statement
>>> will already have a ton of unavoidable words (field names, types, location,
>>> options) so I would argue that adding extra one word would not noticeably
>>> reduce your happiness of writing it :) But it would improve readability and
>>> reduce ambiguity, which I think is worth it.
>>>
>>> I think that making it optional only introduces more confusion (e.g.
>>> what's the difference between the two DDL statements without reading the
>>> doc?) and would make situation worse.
>>>
>>> Regards,
>>> Anton
>>>
>>>
>>>
>>>
>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <mi...@gmail.com> wrote:
>>>
>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>>> `EXTERNAL` as optional?
>>>>
>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from a
>>>>> documentation prospective, but I'd be really unhappy if I had to type out
>>>>> 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL
>>>>> SCHEMA'.)
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I know you are probably using CREATE TABLE, Can I know your thoughts
>>>>>> on this?
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <ru...@google.com> wrote:
>>>>>>
>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be
>>>>>>> better.
>>>>>>>
>>>>>>> "create external" is being widely used by different systems with
>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <mi...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that
>>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>>
>>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be a new
>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but adding
>>>>>>>> something more explicit for the task might be even better.
>>>>>>>>
>>>>>>>> --Mikhail
>>>>>>>>
>>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>>> rfernand@google.com> wrote:
>>>>>>>>
>>>>>>>>> Strictly speaking, they are not necessarily tables either. We
>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE (a-la
>>>>>>>>> T-SQL
>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or restrict
>>>>>>>>> DML statements.
>>>>>>>>>
>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>
>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Community,
>>>>>>>>>>
>>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>>> statements to register virtual tables from external storage systems (e.g.
>>>>>>>>>> BigQuery).
>>>>>>>>>>
>>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be registered by
>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a LOCATION and
>>>>>>>>>> BeamSQL will register the table outside of current execution environment
>>>>>>>>>> based on LOCATION.
>>>>>>>>>>
>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL
>>>>>>>>>> to help users understand they are registering tables, and BeamSQL does not
>>>>>>>>>> create non existing tables by running CREATE TABLE (at least on some
>>>>>>>>>> storage systems, if not all).
>>>>>>>>>>
>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>
>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>
>>>>>>>>>> Pros:
>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>>> TABLE.
>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the
>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only register
>>>>>>>>>> tables in BeamSQL.
>>>>>>>>>>
>>>>>>>>>> Cons:
>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL
>>>>>>>>>> pipelines which has CREATE TABLE require changes.
>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>
>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed
>>>>>>>>>> pros and cons above.
>>>>>>>>>>
>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required or
>>>>>>>>>> optional?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rui
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>>
>>>> --
>>>> ----
>>>> Mingmin
>>>>
>>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Rui Wang <ru...@google.com>.
Adding dev@ back now.

-Rui

On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <ap...@google.com> wrote:

> Did we drop the dev list from this on purpose? (I haven't added it back,
> but we probably should.)
>
> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
> SCHEMA' if there is only to be one option. Sticking with those names
> minimizes both our deviation from other implementations and user surprise.
> There is a lot of value in making sure our common operations closely map to
> the equivalent common operations in other SQL dialects. I think users will
> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
> that it might not always create a table. This minimizes the overhead of
> learning our dialect of SQL and maximizes the odds that a user will be able
> to guess at the syntax of something and have it work. (For example, a user
> guessing at the syntax of CREATE TABLE would have a better experience with
> the error being "field LOCATION not specified" rather than "operation
> CREATE TABLE not found".)
>
> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names remove
> the ambiguity around the operation creating and the data source being a
> table.
>
> Andrew
>
> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <ke...@google.com> wrote:
>
>> My preference is to make `EXTERNAL` mandatory and only support `CREATE
>> EXTERNAL TABLE` for existing semantics. My main reasons are:
>>  - user friendliness, matching expectations, readability. Current `CREATE
>> TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to users
>> familiar with SQL who expect that `CREATE TABLE` will actually create a
>> table;
>>  - forward-compatibility. We could potentially support non-external
>> `CREATE TABLE` at some point in the future, whatever semantics it might
>> have. It will be wrong to use the same syntax for external and non-external
>> CREATEs;
>>
>> I agree that typing extra word each time is not ideal, but my opinion is
>> on the side that readability of code (including SQL) is important (how much
>> time you spend reading / understanding code vs writing it) and we should
>> try to improve it if we can. In case of DDL every non-trivial statement
>> will already have a ton of unavoidable words (field names, types, location,
>> options) so I would argue that adding extra one word would not noticeably
>> reduce your happiness of writing it :) But it would improve readability and
>> reduce ambiguity, which I think is worth it.
>>
>> I think that making it optional only introduces more confusion (e.g.
>> what's the difference between the two DDL statements without reading the
>> doc?) and would make situation worse.
>>
>> Regards,
>> Anton
>>
>>
>>
>>
>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <mi...@gmail.com> wrote:
>>
>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>> `EXTERNAL` as optional?
>>>
>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from a
>>>> documentation prospective, but I'd be really unhappy if I had to type out
>>>> 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL
>>>> SCHEMA'.)
>>>>
>>>> Andrew
>>>>
>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I know you are probably using CREATE TABLE, Can I know your thoughts
>>>>> on this?
>>>>>
>>>>> -Rui
>>>>>
>>>>>
>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be better.
>>>>>>
>>>>>> "create external" is being widely used by different systems with
>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <mi...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The idea of clarification sounds good to me. I'd appreciate that
>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>
>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>> "CREATE" word triggers this reaction in my brain that there will be a new
>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but adding
>>>>>>> something more explicit for the task might be even better.
>>>>>>>
>>>>>>> --Mikhail
>>>>>>>
>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>> rfernand@google.com> wrote:
>>>>>>>
>>>>>>>> Strictly speaking, they are not necessarily tables either. We could
>>>>>>>> also introduce something like CREATE EXTERNAL DATA SOURCE (a-la
>>>>>>>> T-SQL
>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>> if it's somehow advantageous for us to leverage access patterns or restrict
>>>>>>>> DML statements.
>>>>>>>>
>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>
>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Community,
>>>>>>>>>
>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>> statements to register virtual tables from external storage systems (e.g.
>>>>>>>>> BigQuery).
>>>>>>>>>
>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be registered by
>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a LOCATION and
>>>>>>>>> BeamSQL will register the table outside of current execution environment
>>>>>>>>> based on LOCATION.
>>>>>>>>>
>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL
>>>>>>>>> to help users understand they are registering tables, and BeamSQL does not
>>>>>>>>> create non existing tables by running CREATE TABLE (at least on some
>>>>>>>>> storage systems, if not all).
>>>>>>>>>
>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>
>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>
>>>>>>>>> Pros:
>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>> TABLE.
>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the
>>>>>>>>> future if we want CREATE TABLE to create, rather than not only register
>>>>>>>>> tables in BeamSQL.
>>>>>>>>>
>>>>>>>>> Cons:
>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL
>>>>>>>>> pipelines which has CREATE TABLE require changes.
>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>> especially in SQL Shell.
>>>>>>>>>
>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed
>>>>>>>>> pros and cons above.
>>>>>>>>>
>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required or
>>>>>>>>> optional?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rui
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>
>>>
>>> --
>>> ----
>>> Mingmin
>>>
>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Rui Wang <ru...@google.com>.
Thanks Mikhail! "Import" is an alternative option. It might be better.

"create external" is being widely used by different systems with similar
meaning so "create" usually is ok to external data sources.

-Rui

On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <mi...@google.com> wrote:

> The idea of clarification sounds good to me. I'd appreciate that present,
> when I was triaging post-commit tests.
>
> Do we have any terms that specify connection to external table? "CREATE"
> word triggers this reaction in my brain that there will be a new table
> created. Adding "EXTERNAL" would already add distinction, but adding
> something more explicit for the task might be even better.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <rf...@google.com>
> wrote:
>
>> Strictly speaking, they are not necessarily tables either. We could also
>> introduce something like CREATE EXTERNAL DATA SOURCE (a-la T-SQL
>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>> if it's somehow advantageous for us to leverage access patterns or restrict
>> DML statements.
>>
>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>
>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com> wrote:
>>
>>> Hi Community,
>>>
>>> BeamSQL allows CREATE TABLE
>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>> statements to register virtual tables from external storage systems (e.g.
>>> BigQuery).
>>>
>>> BeamSQL is not a storage system, so any table registered by "CREATE
>>> TABLE" statement is essentially equivalent to be registered by "CREATE
>>> EXTERNAL TABLE", which requires the user to provide a LOCATION and BeamSQL
>>> will register the table outside of current execution environment based on
>>> LOCATION.
>>>
>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL to
>>> help users understand they are registering tables, and BeamSQL does not
>>> create non existing tables by running CREATE TABLE (at least on some
>>> storage systems, if not all).
>>>
>>> We can make the EXTERNAL keyword either required or optional.
>>>
>>> If we make the EXTERNAL keyword required:
>>>
>>> Pros:
>>> a. We can get rid of the registering table semantic on CREATE TABLE.
>>> b, We keep the room that we could add CREATE TABLE back in the future if
>>> we want CREATE TABLE to create, rather than not only register tables in
>>> BeamSQL.
>>>
>>> Cons:
>>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL
>>> pipelines which has CREATE TABLE require changes.
>>> 2. It's required to type tedious EXTERNAL keyword every time, especially
>>> in SQL Shell.
>>>
>>> If we make the EXTERNAL keyword optional, we will have reversed pros and
>>> cons above.
>>>
>>> Any thoughts on adding EXTERNAL keyword, and make it required or
>>> optional?
>>>
>>>
>>> Thanks,
>>> Rui
>>>
>>>
>>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Mikhail Gryzykhin <mi...@google.com>.
The idea of clarification sounds good to me. I'd appreciate that present,
when I was triaging post-commit tests.

Do we have any terms that specify connection to external table? "CREATE"
word triggers this reaction in my brain that there will be a new table
created. Adding "EXTERNAL" would already add distinction, but adding
something more explicit for the task might be even better.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <rf...@google.com>
wrote:

> Strictly speaking, they are not necessarily tables either. We could also
> introduce something like CREATE EXTERNAL DATA SOURCE (a-la T-SQL
> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
> if it's somehow advantageous for us to leverage access patterns or restrict
> DML statements.
>
> I think your idea of CREATE EXTERNAL TABLE is practical :)
>
> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com> wrote:
>
>> Hi Community,
>>
>> BeamSQL allows CREATE TABLE
>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>> statements to register virtual tables from external storage systems (e.g.
>> BigQuery).
>>
>> BeamSQL is not a storage system, so any table registered by "CREATE
>> TABLE" statement is essentially equivalent to be registered by "CREATE
>> EXTERNAL TABLE", which requires the user to provide a LOCATION and BeamSQL
>> will register the table outside of current execution environment based on
>> LOCATION.
>>
>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL to help
>> users understand they are registering tables, and BeamSQL does not create
>> non existing tables by running CREATE TABLE (at least on some storage
>> systems, if not all).
>>
>> We can make the EXTERNAL keyword either required or optional.
>>
>> If we make the EXTERNAL keyword required:
>>
>> Pros:
>> a. We can get rid of the registering table semantic on CREATE TABLE.
>> b, We keep the room that we could add CREATE TABLE back in the future if
>> we want CREATE TABLE to create, rather than not only register tables in
>> BeamSQL.
>>
>> Cons:
>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL
>> pipelines which has CREATE TABLE require changes.
>> 2. It's required to type tedious EXTERNAL keyword every time, especially
>> in SQL Shell.
>>
>> If we make the EXTERNAL keyword optional, we will have reversed pros and
>> cons above.
>>
>> Any thoughts on adding EXTERNAL keyword, and make it required or optional?
>>
>>
>> Thanks,
>> Rui
>>
>>
>>

Re: [Discuss] Add EXTERNAL keyword to CREATE TABLE statement

Posted by Rafael Fernandez <rf...@google.com>.
Strictly speaking, they are not necessarily tables either. We could also
introduce something like CREATE EXTERNAL DATA SOURCE (a-la T-SQL
<https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
if it's somehow advantageous for us to leverage access patterns or restrict
DML statements.

I think your idea of CREATE EXTERNAL TABLE is practical :)

On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <ru...@google.com> wrote:

> Hi Community,
>
> BeamSQL allows CREATE TABLE
> <https://beam.apache.org/documentation/dsls/sql/create-table/> statements
> to register virtual tables from external storage systems (e.g. BigQuery).
>
> BeamSQL is not a storage system, so any table registered by "CREATE TABLE"
> statement is essentially equivalent to be registered by "CREATE EXTERNAL
> TABLE", which requires the user to provide a LOCATION and BeamSQL will
> register the table outside of current execution environment based on
> LOCATION.
>
> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL to help
> users understand they are registering tables, and BeamSQL does not create
> non existing tables by running CREATE TABLE (at least on some storage
> systems, if not all).
>
> We can make the EXTERNAL keyword either required or optional.
>
> If we make the EXTERNAL keyword required:
>
> Pros:
> a. We can get rid of the registering table semantic on CREATE TABLE.
> b, We keep the room that we could add CREATE TABLE back in the future if
> we want CREATE TABLE to create, rather than not only register tables in
> BeamSQL.
>
> Cons:
> 1. CREATE TABLE syntax will not be supported so existing BeamSQL pipelines
> which has CREATE TABLE require changes.
> 2. It's required to type tedious EXTERNAL keyword every time, especially
> in SQL Shell.
>
> If we make the EXTERNAL keyword optional, we will have reversed pros and
> cons above.
>
> Any thoughts on adding EXTERNAL keyword, and make it required or optional?
>
>
> Thanks,
> Rui
>
>
>