You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Julian Hyde <jh...@apache.org> on 2014/12/03 22:44:11 UTC

Dynamic columns a la Phoenix

Phoenix has a neat extension to SQL for declaring dynamic columns as
part of a query. This is for columns that are present in the
schema-less table but we didn’t declare them when we issued the CREATE
TABLE statement.

http://phoenix.apache.org/dynamic_columns.html

James Taylor is proposing adding it to Calcite’s SQL:
https://issues.apache.org/jira/browse/CALCITE-493

Drill developers, Would this be a useful syntax extension for Drill? I
think it has some advantages over Drill's implicit, type-less columns.

Julian

Re: Dynamic columns a la Phoenix

Posted by Julian Hyde <ju...@hydromatic.net>.

On Dec 4, 2014, at 6:09 AM, Jacques Nadeau <ja...@apache.org> wrote:

> Very cool! Interesting approach.   It is kind of like create external table
> as a sub clause of the query.
> 
> My question would be why it isn't just done in the select list.   I'm
> generally against adding constructs and new clauses in the parser,
> especially with repetitive information.

These “undeclared” columns can’t be added to the select list because you can’t reference something that doesn’t exist. (Well, you can, and Drill does, but the approach of silently creating something has issues. Weak typing zealots would shrug off those issues, and strong typing zealots would be irritated by them.)

On balance, it doesn’t seem too repetitive: you’re just declaring the new columns; and you’re declaring them just once; and it is an opportunity to provide a type, which quite often reduces the verbosity of the code elsewhere.

This is not clear-cut decision for me. As Ted says, it doesn’t solve all problems, and I see the benefits of other approaches too. But this is a nice application of the principle of schema-on-read — arguably the most important thing that has happened to SQL in the last 20 years — bringing it out of DDL into the query.  So I’m inclined to add this construct to Calcite’s core SQL grammar, and if you don’t like it, don’t use it, and it won’t get in your way.

> We choose to use the existing CONVERT_FROM from sql to solve the datatype
> problem.  It's a bit verbose but avoids introducing new concepts to
> existing sql.  To simplify it, I've considered the other casting form which
> I believe is similar to the Java casting prefix. It would then be:
> 
> SELECT (varchar) col1 FROM t1
> 
> This doesn't distinguish between dynamic and existing columns.  We've shown
> that you don't need to do that but that is pretty complicated to do. If you
> wanted to avoid that,  maybe you could prefix the type. So a mix of dynamic
> and known might look like this :
> 
> SELECT (?varchar) col1, col2 FROM t1

I don’t know CONVERT_FROM. Did you consider using CAST? E.g. CAST(x AS VARCHAR(10)).

Cast is very flexible, and unlike Java casts, it does conversions specific to source and target type. In Java 

Object o = new Double(3.1415);
System.out.println((String) o));

will throw ClassCastException, but SQL

select cast(o as VARCHAR(10)) from (values (3.1415)) as t(o);

will print ‘3.1415’.

> That being said, I've worked with the parser and can guess the pain of
> trying to add it as optional to the existing select list parsing.
> Especially given scalar sub queries.

If you’re talking about the Java-style casting syntax with parentheses. Let’s just not go there. It doesn’t belong in SQL any more than && for AND.

> If merged (even in parser definition files) I would definitely want it to
> be optional as it is.  That namespace is already cluttered and I think we
> should only merge new language features into core if they are sql spec or
> already used by the majority of projects. What about a language features
> contrib module?

My philosophy is to stick with the standard almost all of the time. I justify going beyond the standard if there is no syntactic class with standard SQL and it is an area not addressed by the standard. In my opinion this feature — and by the way JSON literals — falls into this category.

Look at some of the other JIRA cases tagged “phoenix” and you will see that I am suggesting putting those into a Phoenix-specific sub-class of the parser, because they don’t meet that high standard.

Julian

Re: Dynamic columns a la Phoenix

Posted by Julian Hyde <ju...@hydromatic.net>.

On Dec 4, 2014, at 6:09 AM, Jacques Nadeau <ja...@apache.org> wrote:

> Very cool! Interesting approach.   It is kind of like create external table
> as a sub clause of the query.
> 
> My question would be why it isn't just done in the select list.   I'm
> generally against adding constructs and new clauses in the parser,
> especially with repetitive information.

These “undeclared” columns can’t be added to the select list because you can’t reference something that doesn’t exist. (Well, you can, and Drill does, but the approach of silently creating something has issues. Weak typing zealots would shrug off those issues, and strong typing zealots would be irritated by them.)

On balance, it doesn’t seem too repetitive: you’re just declaring the new columns; and you’re declaring them just once; and it is an opportunity to provide a type, which quite often reduces the verbosity of the code elsewhere.

This is not clear-cut decision for me. As Ted says, it doesn’t solve all problems, and I see the benefits of other approaches too. But this is a nice application of the principle of schema-on-read — arguably the most important thing that has happened to SQL in the last 20 years — bringing it out of DDL into the query.  So I’m inclined to add this construct to Calcite’s core SQL grammar, and if you don’t like it, don’t use it, and it won’t get in your way.

> We choose to use the existing CONVERT_FROM from sql to solve the datatype
> problem.  It's a bit verbose but avoids introducing new concepts to
> existing sql.  To simplify it, I've considered the other casting form which
> I believe is similar to the Java casting prefix. It would then be:
> 
> SELECT (varchar) col1 FROM t1
> 
> This doesn't distinguish between dynamic and existing columns.  We've shown
> that you don't need to do that but that is pretty complicated to do. If you
> wanted to avoid that,  maybe you could prefix the type. So a mix of dynamic
> and known might look like this :
> 
> SELECT (?varchar) col1, col2 FROM t1

I don’t know CONVERT_FROM. Did you consider using CAST? E.g. CAST(x AS VARCHAR(10)).

Cast is very flexible, and unlike Java casts, it does conversions specific to source and target type. In Java 

Object o = new Double(3.1415);
System.out.println((String) o));

will throw ClassCastException, but SQL

select cast(o as VARCHAR(10)) from (values (3.1415)) as t(o);

will print ‘3.1415’.

> That being said, I've worked with the parser and can guess the pain of
> trying to add it as optional to the existing select list parsing.
> Especially given scalar sub queries.

If you’re talking about the Java-style casting syntax with parentheses. Let’s just not go there. It doesn’t belong in SQL any more than && for AND.

> If merged (even in parser definition files) I would definitely want it to
> be optional as it is.  That namespace is already cluttered and I think we
> should only merge new language features into core if they are sql spec or
> already used by the majority of projects. What about a language features
> contrib module?

My philosophy is to stick with the standard almost all of the time. I justify going beyond the standard if there is no syntactic class with standard SQL and it is an area not addressed by the standard. In my opinion this feature — and by the way JSON literals — falls into this category.

Look at some of the other JIRA cases tagged “phoenix” and you will see that I am suggesting putting those into a Phoenix-specific sub-class of the parser, because they don’t meet that high standard.

Julian

Re: Dynamic columns a la Phoenix

Posted by Jacques Nadeau <ja...@apache.org>.

Very cool! Interesting approach.   It is kind of like create external table
as a sub clause of the query.

My question would be why it isn't just done in the select list.   I'm
generally against adding constructs and new clauses in the parser,
especially with repetitive information.

We choose to use the existing CONVERT_FROM from sql to solve the datatype
problem.  It's a bit verbose but avoids introducing new concepts to
existing sql.  To simplify it, I've considered the other casting form which
I believe is similar to the Java casting prefix. It would then be:

SELECT (varchar) col1 FROM t1

This doesn't distinguish between dynamic and existing columns.  We've shown
that you don't need to do that but that is pretty complicated to do. If you
wanted to avoid that,  maybe you could prefix the type. So a mix of dynamic
and known might look like this :

SELECT (?varchar) col1, col2 FROM t1

That being said, I've worked with the parser and can guess the pain of
trying to add it as optional to the existing select list parsing.
Especially given scalar sub queries.

If merged (even in parser definition files) I would definitely want it to
be optional as it is.  That namespace is already cluttered and I think we
should only merge new language features into core if they are sql spec or
already used by the majority of projects. What about a language features
contrib module?

All in all,  very nice work James etc al.
On Dec 4, 2014 4:25 AM, "Ted Dunning" <te...@gmail.com> wrote:

> Might make an interesting alternative, but I don't think that it suffices
> for all needs.  In particular, there are definitely times when I don't know
> what columns to declare.
>
>
>
> On Wed, Dec 3, 2014 at 1:44 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > Phoenix has a neat extension to SQL for declaring dynamic columns as
> > part of a query. This is for columns that are present in the
> > schema-less table but we didn’t declare them when we issued the CREATE
> > TABLE statement.
> >
> > http://phoenix.apache.org/dynamic_columns.html
> >
> > James Taylor is proposing adding it to Calcite’s SQL:
> > https://issues.apache.org/jira/browse/CALCITE-493
> >
> > Drill developers, Would this be a useful syntax extension for Drill? I
> > think it has some advantages over Drill's implicit, type-less columns.
> >
> > Julian
> >
>

Re: Dynamic columns a la Phoenix

Posted by Jacques Nadeau <ja...@apache.org>.

Very cool! Interesting approach.   It is kind of like create external table
as a sub clause of the query.

My question would be why it isn't just done in the select list.   I'm
generally against adding constructs and new clauses in the parser,
especially with repetitive information.

We choose to use the existing CONVERT_FROM from sql to solve the datatype
problem.  It's a bit verbose but avoids introducing new concepts to
existing sql.  To simplify it, I've considered the other casting form which
I believe is similar to the Java casting prefix. It would then be:

SELECT (varchar) col1 FROM t1

This doesn't distinguish between dynamic and existing columns.  We've shown
that you don't need to do that but that is pretty complicated to do. If you
wanted to avoid that,  maybe you could prefix the type. So a mix of dynamic
and known might look like this :

SELECT (?varchar) col1, col2 FROM t1

That being said, I've worked with the parser and can guess the pain of
trying to add it as optional to the existing select list parsing.
Especially given scalar sub queries.

If merged (even in parser definition files) I would definitely want it to
be optional as it is.  That namespace is already cluttered and I think we
should only merge new language features into core if they are sql spec or
already used by the majority of projects. What about a language features
contrib module?

All in all,  very nice work James etc al.
On Dec 4, 2014 4:25 AM, "Ted Dunning" <te...@gmail.com> wrote:

> Might make an interesting alternative, but I don't think that it suffices
> for all needs.  In particular, there are definitely times when I don't know
> what columns to declare.
>
>
>
> On Wed, Dec 3, 2014 at 1:44 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > Phoenix has a neat extension to SQL for declaring dynamic columns as
> > part of a query. This is for columns that are present in the
> > schema-less table but we didn’t declare them when we issued the CREATE
> > TABLE statement.
> >
> > http://phoenix.apache.org/dynamic_columns.html
> >
> > James Taylor is proposing adding it to Calcite’s SQL:
> > https://issues.apache.org/jira/browse/CALCITE-493
> >
> > Drill developers, Would this be a useful syntax extension for Drill? I
> > think it has some advantages over Drill's implicit, type-less columns.
> >
> > Julian
> >
>

Re: Dynamic columns a la Phoenix

Posted by Ted Dunning <te...@gmail.com>.

Might make an interesting alternative, but I don't think that it suffices
for all needs.  In particular, there are definitely times when I don't know
what columns to declare.

On Wed, Dec 3, 2014 at 1:44 PM, Julian Hyde <jh...@apache.org> wrote:

> Phoenix has a neat extension to SQL for declaring dynamic columns as
> part of a query. This is for columns that are present in the
> schema-less table but we didn’t declare them when we issued the CREATE
> TABLE statement.
>
> http://phoenix.apache.org/dynamic_columns.html
>
> James Taylor is proposing adding it to Calcite’s SQL:
> https://issues.apache.org/jira/browse/CALCITE-493
>
> Drill developers, Would this be a useful syntax extension for Drill? I
> think it has some advantages over Drill's implicit, type-less columns.
>
> Julian
>

Re: Dynamic columns a la Phoenix

Posted by Ted Dunning <te...@gmail.com>.

Might make an interesting alternative, but I don't think that it suffices
for all needs.  In particular, there are definitely times when I don't know
what columns to declare.

On Wed, Dec 3, 2014 at 1:44 PM, Julian Hyde <jh...@apache.org> wrote:

> Phoenix has a neat extension to SQL for declaring dynamic columns as
> part of a query. This is for columns that are present in the
> schema-less table but we didn’t declare them when we issued the CREATE
> TABLE statement.
>
> http://phoenix.apache.org/dynamic_columns.html
>
> James Taylor is proposing adding it to Calcite’s SQL:
> https://issues.apache.org/jira/browse/CALCITE-493
>
> Drill developers, Would this be a useful syntax extension for Drill? I
> think it has some advantages over Drill's implicit, type-less columns.
>
> Julian
>