You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Julien Le Dem <ju...@dremio.com> on 2015/11/04 02:17:35 UTC

Re: select from table with options

Thanks Julian,
I have looked into using Table Functions in Drill. I had to make some
modifications to the planner so that the function lookup in the Storage
plugin works. I will submit a patch for that.

I had a few questions:
 *1)* For this particular use case it seems that we could use TableMacro as
all the logic can be happening in the planner. Should I look into that?
   - Drill Schema returns a DrillTable (which implements Table).
   - A TableMacro returns a TranslatableTable
   - It is not clear to me what a TableFunction returns as it defines only
methods that return types.
 Ideally I'd like to produce a DrillTable like getTable in Schema, the only
difference with getTable is that we use the function parameters when
producing a table.
For reference: Drill getTable there:
https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
It indirectly calls:
https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317

 *2)* The getFunctions method in Schema does not seem to be aware at all of
the context it is called in. I would want to return different functions
depending on where we are in the query (table functions in the from clause,
regular functions in where). Is there a way to know if we are in the
context of a FROM or a WHERE clause?

 *3)* is the table(...) wrapping syntax necessary?
Note:
  - In Drill back ticks are use for identifiers containing dot or slash.
like the path to the file as a table name: dfs.`/path/to/file.ext`
  - single quotes are used to delimit strings: 'my string passed as a
parameter'

  The current syntax is something like:
*     select * from table(dfs.delimitedFile(path => '/path/to/file',
delimiter => '|'))*
*     select * from table(dfs.`**/path/to/file`**(type => 'text', delimiter
=> '|'))*
*     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
It seems that table(...) is redundant since we are in the from clause.
 It could simply be:
*     select * from dfs.delimitedFile(path => '/path/to/file', delimiter =>
'|')*
*     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
'|')*
*     select * from dfs.`**/path/to/file`**(type => 'json')*

 *4)* Can a table be a parameter? If yes, how do we declare a table
parameter? (not the backticks instead of single quotes)
*     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
delimiter => '|')*

Thank you!
Julien


On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:

> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> > Agreed. We need both select with option and .drill (by etl process or by
> > sql ascribe metadata).
> >
> > Let's start with the select with options. My only goal would be to make
> > sure that creation of .drill file through SQL uses a similar pattern to
> the
> > select with options. It is also important that tables names are still
> > expressed as identifiers instead of strings (people already have enough
> > trouble with remembering whether to use single quotes or backticks). If
> the
> > table function approach is everybody's preferred approach, I think it is
> > important to have named parameters per Julian's notes.
> >
> > @Julian, how hard do you think it will be to add named parameters?
>
> I just checked in a fix for
> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>



-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks! Let me know if you need help.

> On Nov 13, 2015, at 9:29 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Here you go: https://issues.apache.org/jira/browse/CALCITE-967
> I was planning on providing patch for both master and the fork, but I
> haven't started yet.
> 
> On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> You’re hitting the grammar ambiguity I expected.
>> 
>> I think that base Calcite should require the full verbose syntax: the
>> TABLE keyword for table functions and the EXTEND keyword for extends
>> clauses. Then Drill can override to make TABLE optional, and Phoenix can
>> override to make EXTEND optional.
>> 
>> Are you changing the parser in your forked copy of Calcite, or are you
>> changing Drill’s extensions to that parser?
>> 
>> If the former, you (or I) should add extension points to Calcite’s parser
>> make the TABLE keyword optional and to make the EXTEND keyword optional. No
>> project should enable both extension points — otherwise they’ll end up with
>> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>> 
>> Julian
>> 
>> 
>>> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> 
>>> Hi,
>>> I've been trying to enable this but it looks like in the current grammar
>>> (before my change) you can not use table functions and EXTEND together.
>>> That's because they are on difference branches of an | in the grammar.
>>> So I would suggest that we treat those as two separate improvement in two
>>> different pull requests:
>>> - not require table(...) to call table functions
>>> - allow using table functions and extend together.
>>> Does it make sense?
>>> Julien
>>> 
>>> 
>>> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> To be clear, it should be possible to use a table function with all of
>>>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>>>> aliases, TABLESAMPLE.
>>>> 
>>>> I'm surprised that the parser didn't need more lookahead to choose
>>>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>>>> 
>>>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> In the patch I just sent, probably not.
>>>>> I will adjust it and add the corresponding test.
>>>>> 
>>>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>> 
>>>>>> Can you use both together? Say
>>>>>> 
>>>>>> select columns
>>>>>> from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>>>> EXTEND
>>>>>> (foo INTEGER)
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>>>> wrote:
>>>>>>> 
>>>>>>> I took a stab at adding the TableFunction syntax without table(...)
>> in
>>>>>>> Calcite.
>>>>>>> I have verified that both the table function and extend (with or
>>>> without
>>>>>>> keyword) work
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>>>> 
>>>>>>> These work:
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>>>> fieldDelimiter
>>>>>> =>
>>>>>>> '|')
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>>>> fieldDelimiter => '|'))
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>>>> we
>>>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> That being said, I think we could make these work together... maybe.
>>>>>>>> 
>>>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>>>> 
>>>>>>>> Create a new sub block in the table block that requires no keyword.
>>>>>> There
>>>>>>>> would be two paths (and would probably require some lookahead)
>>>>>>>> 
>>>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>>>> golfHandicap INTEGER)
>>>>>>>> 
>>>>>>>> Then we create a table function with options 1 & 2, an EXTENDS
>> clause
>>>>>> for
>>>>>>>> option 3.
>>>>>>>> 
>>>>>>>> Best of both worlds?
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
>> jamestaylor@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Phoenix already supports columns at read-time using the syntax
>>>> without
>>>>>>>> the
>>>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>>> SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>> 
>>>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>>>> backward
>>>>>>>>> compatibility problem.
>>>>>>>>> 
>>>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>>>> Phoenix
>>>>>>>>> given our Drillix initiative.
>>>>>>>>> 
>>>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>>>> grammar,
>>>>>>>>>> not something later.
>>>>>>>>>> 
>>>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>>>> only
>>>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>>>> should
>>>>>>>>> be
>>>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>>>> (the
>>>>>>>>> same
>>>>>>>>>> way we plug in external things in Drill).
>>>>>>>>>> 
>>>>>>>>>> Trying to get every project to agree on extensions seems like it
>>>> may
>>>>>> be
>>>>>>>>>> hard.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Jacques Nadeau
>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>> 
>>>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>>>> 
>>>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
>> be
>>>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>>>> later.
>>>>>>>>>>> 
>>>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>>>> avoiding
>>>>>> a
>>>>>>>>>>> tough design discussion.
>>>>>>>>>>> 
>>>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
>> and
>>>>>>>> they
>>>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>>>> Phoenix
>>>>>>>> (by
>>>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>>>> SQL
>>>>>>>>> syntax
>>>>>>>>>>> should be.
>>>>>>>>>>> 
>>>>>>>>>>> Julian
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>>>> appears
>>>>>>>>>>> more
>>>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>>>> jacques@dremio.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>>>> we
>>>>>>>>> could
>>>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>>>> seems
>>>>>>>>>>>>> reasonable to support the table functions without the TABLE
>>>> syntax.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>>>> confusing
>>>>>>>> to
>>>>>>>>>>> use,
>>>>>>>>>>>>> especially in the example that we're looking to support, such
>>>> as:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>> fieldDelimiter
>>>>>>>>> =>
>>>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>>>> fieldDelimiter
>>>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>>>> goal).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for doing the legwork and finding what the other
>> vendors
>>>>>>>> do.
>>>>>>>>>>> It is
>>>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>>>> standard
>>>>>>>>>>> an
>>>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>>>> clash
>>>>>>>>>>> with
>>>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
>> the
>>>>>>>>>>> EXTENDS
>>>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>>>> Emp
>>>>>>>>> table
>>>>>>>>>>>>> but
>>>>>>>>>>>>>> you would like to use them in this particular query. We chose
>>>> to
>>>>>>>>> make
>>>>>>>>>>> the
>>>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> so we would require
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>>>> forms
>>>>>>>>>>> like
>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>>>> should
>>>>>>>>>>>>> also
>>>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>>>> there's
>>>>>>>>> no
>>>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>>>> calling
>>>>>>>>> table
>>>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>>>> DB2
>>>>>>>>> [4]
>>>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>>>> generation
>>>>>>>>> to
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>>>> MySQL
>>>>>>>>> like
>>>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> 
>>>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>>>> [3]
>>>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>>>> [4]
>>>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>>>> function
>>>>>>>>>>>>>>> overloading:
>> https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>>>> called
>>>>>>>>>>> more
>>>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>>>> want
>>>>>>>>> to
>>>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
>> jhyde@apache.org
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
>> julien@dremio.com
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>>>> Calcite
>>>>>>>>>>> when
>>>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>>>> now
>>>>>>>>> just
>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>>>> are
>>>>>>>>> the
>>>>>>>>>>>>> way
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>>>> syntax
>>>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>>>> similar.
>>>>>>>>> Also,
>>>>>>>>>>>>> if
>>>>>>>>>>>>>> f
>>>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>>>> this?:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>>>> standards
>>>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>>>> disambiguate.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>>>> used
>>>>>>>>> in
>>>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
>> it.
>>>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
>> we
>>>>>>>> get
>>>>>>>>> a
>>>>>>>>>>>>> NPE
>>>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>>>> a
>>>>>>>>> JIRA
>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
>> named
>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>>>> functions.)
>>>>>>>>>>> The
>>>>>>>>>>>>>> fix
>>>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> *Jim Scott*
>>>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>>>> 
>>>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>>>> 
>>>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Julien
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks! Let me know if you need help.

> On Nov 13, 2015, at 9:29 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Here you go: https://issues.apache.org/jira/browse/CALCITE-967
> I was planning on providing patch for both master and the fork, but I
> haven't started yet.
> 
> On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> You’re hitting the grammar ambiguity I expected.
>> 
>> I think that base Calcite should require the full verbose syntax: the
>> TABLE keyword for table functions and the EXTEND keyword for extends
>> clauses. Then Drill can override to make TABLE optional, and Phoenix can
>> override to make EXTEND optional.
>> 
>> Are you changing the parser in your forked copy of Calcite, or are you
>> changing Drill’s extensions to that parser?
>> 
>> If the former, you (or I) should add extension points to Calcite’s parser
>> make the TABLE keyword optional and to make the EXTEND keyword optional. No
>> project should enable both extension points — otherwise they’ll end up with
>> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>> 
>> Julian
>> 
>> 
>>> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> 
>>> Hi,
>>> I've been trying to enable this but it looks like in the current grammar
>>> (before my change) you can not use table functions and EXTEND together.
>>> That's because they are on difference branches of an | in the grammar.
>>> So I would suggest that we treat those as two separate improvement in two
>>> different pull requests:
>>> - not require table(...) to call table functions
>>> - allow using table functions and extend together.
>>> Does it make sense?
>>> Julien
>>> 
>>> 
>>> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> To be clear, it should be possible to use a table function with all of
>>>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>>>> aliases, TABLESAMPLE.
>>>> 
>>>> I'm surprised that the parser didn't need more lookahead to choose
>>>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>>>> 
>>>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> In the patch I just sent, probably not.
>>>>> I will adjust it and add the corresponding test.
>>>>> 
>>>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>> 
>>>>>> Can you use both together? Say
>>>>>> 
>>>>>> select columns
>>>>>> from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>>>> EXTEND
>>>>>> (foo INTEGER)
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>>>> wrote:
>>>>>>> 
>>>>>>> I took a stab at adding the TableFunction syntax without table(...)
>> in
>>>>>>> Calcite.
>>>>>>> I have verified that both the table function and extend (with or
>>>> without
>>>>>>> keyword) work
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>>>> 
>>>>>>> These work:
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>>>> fieldDelimiter
>>>>>> =>
>>>>>>> '|')
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>>>> fieldDelimiter => '|'))
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>>>> we
>>>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> That being said, I think we could make these work together... maybe.
>>>>>>>> 
>>>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>>>> 
>>>>>>>> Create a new sub block in the table block that requires no keyword.
>>>>>> There
>>>>>>>> would be two paths (and would probably require some lookahead)
>>>>>>>> 
>>>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>>>> golfHandicap INTEGER)
>>>>>>>> 
>>>>>>>> Then we create a table function with options 1 & 2, an EXTENDS
>> clause
>>>>>> for
>>>>>>>> option 3.
>>>>>>>> 
>>>>>>>> Best of both worlds?
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
>> jamestaylor@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Phoenix already supports columns at read-time using the syntax
>>>> without
>>>>>>>> the
>>>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>>> SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>> 
>>>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>>>> backward
>>>>>>>>> compatibility problem.
>>>>>>>>> 
>>>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>>>> Phoenix
>>>>>>>>> given our Drillix initiative.
>>>>>>>>> 
>>>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>>>> grammar,
>>>>>>>>>> not something later.
>>>>>>>>>> 
>>>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>>>> only
>>>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>>>> should
>>>>>>>>> be
>>>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>>>> (the
>>>>>>>>> same
>>>>>>>>>> way we plug in external things in Drill).
>>>>>>>>>> 
>>>>>>>>>> Trying to get every project to agree on extensions seems like it
>>>> may
>>>>>> be
>>>>>>>>>> hard.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Jacques Nadeau
>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>> 
>>>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>>>> 
>>>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
>> be
>>>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>>>> later.
>>>>>>>>>>> 
>>>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>>>> avoiding
>>>>>> a
>>>>>>>>>>> tough design discussion.
>>>>>>>>>>> 
>>>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
>> and
>>>>>>>> they
>>>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>>>> Phoenix
>>>>>>>> (by
>>>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>>>> SQL
>>>>>>>>> syntax
>>>>>>>>>>> should be.
>>>>>>>>>>> 
>>>>>>>>>>> Julian
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>>>> appears
>>>>>>>>>>> more
>>>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>>>> jacques@dremio.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>>>> we
>>>>>>>>> could
>>>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>>>> seems
>>>>>>>>>>>>> reasonable to support the table functions without the TABLE
>>>> syntax.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>>>> confusing
>>>>>>>> to
>>>>>>>>>>> use,
>>>>>>>>>>>>> especially in the example that we're looking to support, such
>>>> as:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>> fieldDelimiter
>>>>>>>>> =>
>>>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>>>> fieldDelimiter
>>>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>>>> goal).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for doing the legwork and finding what the other
>> vendors
>>>>>>>> do.
>>>>>>>>>>> It is
>>>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>>>> standard
>>>>>>>>>>> an
>>>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>>>> clash
>>>>>>>>>>> with
>>>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
>> the
>>>>>>>>>>> EXTENDS
>>>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>>>> Emp
>>>>>>>>> table
>>>>>>>>>>>>> but
>>>>>>>>>>>>>> you would like to use them in this particular query. We chose
>>>> to
>>>>>>>>> make
>>>>>>>>>>> the
>>>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> so we would require
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>>>> forms
>>>>>>>>>>> like
>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>>>> should
>>>>>>>>>>>>> also
>>>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>>>> there's
>>>>>>>>> no
>>>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>>>> calling
>>>>>>>>> table
>>>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>>>> DB2
>>>>>>>>> [4]
>>>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>>>> generation
>>>>>>>>> to
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>>>> MySQL
>>>>>>>>> like
>>>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> 
>>>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>>>> [3]
>>>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>>>> [4]
>>>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>>>> function
>>>>>>>>>>>>>>> overloading:
>> https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>>>> called
>>>>>>>>>>> more
>>>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>>>> want
>>>>>>>>> to
>>>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
>> jhyde@apache.org
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
>> julien@dremio.com
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>>>> Calcite
>>>>>>>>>>> when
>>>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>>>> now
>>>>>>>>> just
>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>>>> are
>>>>>>>>> the
>>>>>>>>>>>>> way
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>>>> syntax
>>>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>>>> similar.
>>>>>>>>> Also,
>>>>>>>>>>>>> if
>>>>>>>>>>>>>> f
>>>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>>>> this?:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>>>> standards
>>>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>>>> disambiguate.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>>>> used
>>>>>>>>> in
>>>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
>> it.
>>>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
>> we
>>>>>>>> get
>>>>>>>>> a
>>>>>>>>>>>>> NPE
>>>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>>>> a
>>>>>>>>> JIRA
>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
>> named
>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>>>> functions.)
>>>>>>>>>>> The
>>>>>>>>>>>>>> fix
>>>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> *Jim Scott*
>>>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>>>> 
>>>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>>>> 
>>>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Julien
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Here you go: https://issues.apache.org/jira/browse/CALCITE-967
I was planning on providing patch for both master and the fork, but I
haven't started yet.

On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:

> You’re hitting the grammar ambiguity I expected.
>
> I think that base Calcite should require the full verbose syntax: the
> TABLE keyword for table functions and the EXTEND keyword for extends
> clauses. Then Drill can override to make TABLE optional, and Phoenix can
> override to make EXTEND optional.
>
> Are you changing the parser in your forked copy of Calcite, or are you
> changing Drill’s extensions to that parser?
>
> If the former, you (or I) should add extension points to Calcite’s parser
> make the TABLE keyword optional and to make the EXTEND keyword optional. No
> project should enable both extension points — otherwise they’ll end up with
> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>
> Julian
>
>
> > On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > Hi,
> > I've been trying to enable this but it looks like in the current grammar
> > (before my change) you can not use table functions and EXTEND together.
> > That's because they are on difference branches of an | in the grammar.
> > So I would suggest that we treat those as two separate improvement in two
> > different pull requests:
> > - not require table(...) to call table functions
> > - allow using table functions and extend together.
> > Does it make sense?
> > Julien
> >
> >
> > On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> To be clear, it should be possible to use a table function with all of
> >> the options -- EXTENDS clause, OVER clause, AS with alias and column
> >> aliases, TABLESAMPLE.
> >>
> >> I'm surprised that the parser didn't need more lookahead to choose
> >> between 't (x, y)' and 't (x INTEGER, y DATE)'.
> >>
> >> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >>> In the patch I just sent, probably not.
> >>> I will adjust it and add the corresponding test.
> >>>
> >>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>
> >>>> Can you use both together? Say
> >>>>
> >>>>  select columns
> >>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
> >> EXTEND
> >>>> (foo INTEGER)
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >>>>>
> >>>>> I took a stab at adding the TableFunction syntax without table(...)
> in
> >>>>> Calcite.
> >>>>> I have verified that both the table function and extend (with or
> >> without
> >>>>> keyword) work
> >>>>>
> >>>>
> >>
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >>>>>
> >>>>> These work:
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
> >> fieldDelimiter
> >>>> =>
> >>>>> '|')
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> >>>>> fieldDelimiter => '|'))
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`('JSON')
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >>>>>
> >>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
> >> we
> >>>>>> should constrain ourselves to having a union of the two syntaxes.
> >>>>>>
> >>>>>>
> >>>>>> That being said, I think we could make these work together... maybe.
> >>>>>>
> >>>>>> Remove the EXTENDS without keyword syntax from the grammar.
> >>>>>>
> >>>>>> Create a new sub block in the table block that requires no keyword.
> >>>> There
> >>>>>> would be two paths (and would probably require some lookahead)
> >>>>>>
> >>>>>> option 1> unnamed parameters (1,2,3)
> >>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
> >>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >>>>>> golfHandicap INTEGER)
> >>>>>>
> >>>>>> Then we create a table function with options 1 & 2, an EXTENDS
> clause
> >>>> for
> >>>>>> option 3.
> >>>>>>
> >>>>>> Best of both worlds?
> >>>>>>
> >>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
> jamestaylor@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Phoenix already supports columns at read-time using the syntax
> >> without
> >>>>>> the
> >>>>>>> EXTENDS keyword as Julian indicated:
> >>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>  WHERE goldHandicap < 10;
> >>>>>>>
> >>>>>>> Changing this by requiring the EXTENDS keyword would create a
> >> backward
> >>>>>>> compatibility problem.
> >>>>>>>
> >>>>>>> I think it'd be good if both of these extensions worked in Drill &
> >>>>>> Phoenix
> >>>>>>> given our Drillix initiative.
> >>>>>>>
> >>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> My proposal was an a or b using the freemarker template in the
> >>>> grammar,
> >>>>>>>> not something later.
> >>>>>>>>
> >>>>>>>> Actually, put another way: we may want to consider stating that we
> >>>> only
> >>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
> >>>> should
> >>>>>>> be
> >>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
> >> (the
> >>>>>>> same
> >>>>>>>> way we plug in external things in Drill).
> >>>>>>>>
> >>>>>>>> Trying to get every project to agree on extensions seems like it
> >> may
> >>>> be
> >>>>>>>> hard.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Jacques Nadeau
> >>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>
> >>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> I can see why Jacques wants this syntax.
> >>>>>>>>>
> >>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
> be
> >>>>>>>>> predictable. Any variation should happen at validation time, or
> >>>> later.
> >>>>>>>>>
> >>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
> >> avoiding
> >>>> a
> >>>>>>>>> tough design discussion.
> >>>>>>>>>
> >>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
> and
> >>>>>> they
> >>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
> >> Phoenix
> >>>>>> (by
> >>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
> >> SQL
> >>>>>>> syntax
> >>>>>>>>> should be.
> >>>>>>>>>
> >>>>>>>>> Julian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Looking at those two examples I agree with Jacques. The first
> >>>>>> appears
> >>>>>>>>> more
> >>>>>>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
> >> jacques@dremio.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
> >> we
> >>>>>>> could
> >>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
> >> seems
> >>>>>>>>>>> reasonable to support the table functions without the TABLE
> >> syntax.
> >>>>>>>>>>>
> >>>>>>>>>>> I for one definitely think the TABLE syntax is much more
> >> confusing
> >>>>>> to
> >>>>>>>>> use,
> >>>>>>>>>>> especially in the example that we're looking to support, such
> >> as:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>> fieldDelimiter
> >>>>>>> =>
> >>>>>>>>>>> '|', skipFirstRow => true)
> >>>>>>>>>>>
> >>>>>>>>>>> This seems much clearer than:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>>>>> fieldDelimiter
> >>>>>>>>>>> => '|', skipFirstRow => true))
> >>>>>>>>>>>
> >>>>>>>>>>> It also looks much more like a hint to the table (which is our
> >>>>>> goal).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for doing the legwork and finding what the other
> vendors
> >>>>>> do.
> >>>>>>>>> It is
> >>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>>>>>> standard
> >>>>>>>>> an
> >>>>>>>>>>>> make the TABLE keyword optional.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >>>>>> clash
> >>>>>>>>> with
> >>>>>>>>>>>> one of our own (few) extensions. In
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
> the
> >>>>>>>>> EXTENDS
> >>>>>>>>>>>> clause. You can write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
> >> Emp
> >>>>>>> table
> >>>>>>>>>>> but
> >>>>>>>>>>>> you would like to use them in this particular query. We chose
> >> to
> >>>>>>> make
> >>>>>>>>> the
> >>>>>>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> That is uncomfortably close to
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>>>>>
> >>>>>>>>>>>> so we would require
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>>>>>
> >>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
> >>>>>> forms
> >>>>>>>>> like
> >>>>>>>>>>>> this:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>>>>>
> >>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>>>>>> should
> >>>>>>>>>>> also
> >>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
> >>>>>> there's
> >>>>>>> no
> >>>>>>>>>>>>> consensus about this.
> >>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
> >> calling
> >>>>>>> table
> >>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
> >> DB2
> >>>>>>> [4]
> >>>>>>>>>>>>> expect it.
> >>>>>>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >>>>>> generation
> >>>>>>> to
> >>>>>>>>>>>> have
> >>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
> >> MySQL
> >>>>>>> like
> >>>>>>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>>>>>> [2]
> >>>>>>>>>>>
> >>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>>>>>> [3]
> >>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>>>>>> [4]
> >>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>>>>>> [5]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
> >> function
> >>>>>>>>>>>>> overloading:
> https://github.com/apache/calcite/pull/166/files
> >>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
> >>>>>> called
> >>>>>>>>> more
> >>>>>>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
> >> want
> >>>>>>> to
> >>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
> jhyde@apache.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
> julien@dremio.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >>>>>> Calcite
> >>>>>>>>> when
> >>>>>>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes; see below.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
> >> now
> >>>>>>> just
> >>>>>>>>>>>> being
> >>>>>>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
> >> are
> >>>>>>> the
> >>>>>>>>>>> way
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> go.
> >>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
> >> syntax
> >>>>>>>>>>>>>> necessary?*
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Consider:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
> >> similar.
> >>>>>>> Also,
> >>>>>>>>>>> if
> >>>>>>>>>>>> f
> >>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
> >>>>>> this?:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from f
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>>>>>> standards
> >>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
> >> disambiguate.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
> >> used
> >>>>>>> in
> >>>>>>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
> it.
> >>>>>>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
> we
> >>>>>> get
> >>>>>>> a
> >>>>>>>>>>> NPE
> >>>>>>>>>>>>>> later on.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
> >> a
> >>>>>>> JIRA
> >>>>>>>>>>> case
> >>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
> named
> >>>>>>>>>>> arguments.
> >>>>>>>>>>>>>> (It probably applies to all functions, not just table
> >>>>>> functions.)
> >>>>>>>>> The
> >>>>>>>>>>>> fix
> >>>>>>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> *Jim Scott*
> >>>>>>>>>> Director, Enterprise Strategy & Architecture
> >>>>>>>>>> +1 (347) 746-9281
> >>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>>>>>
> >>>>>>>>>> <http://www.mapr.com/>
> >>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>>>>>
> >>>>>>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Julien
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Julien
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Here you go: https://issues.apache.org/jira/browse/CALCITE-967
I was planning on providing patch for both master and the fork, but I
haven't started yet.

On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:

> You’re hitting the grammar ambiguity I expected.
>
> I think that base Calcite should require the full verbose syntax: the
> TABLE keyword for table functions and the EXTEND keyword for extends
> clauses. Then Drill can override to make TABLE optional, and Phoenix can
> override to make EXTEND optional.
>
> Are you changing the parser in your forked copy of Calcite, or are you
> changing Drill’s extensions to that parser?
>
> If the former, you (or I) should add extension points to Calcite’s parser
> make the TABLE keyword optional and to make the EXTEND keyword optional. No
> project should enable both extension points — otherwise they’ll end up with
> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>
> Julian
>
>
> > On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > Hi,
> > I've been trying to enable this but it looks like in the current grammar
> > (before my change) you can not use table functions and EXTEND together.
> > That's because they are on difference branches of an | in the grammar.
> > So I would suggest that we treat those as two separate improvement in two
> > different pull requests:
> > - not require table(...) to call table functions
> > - allow using table functions and extend together.
> > Does it make sense?
> > Julien
> >
> >
> > On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> To be clear, it should be possible to use a table function with all of
> >> the options -- EXTENDS clause, OVER clause, AS with alias and column
> >> aliases, TABLESAMPLE.
> >>
> >> I'm surprised that the parser didn't need more lookahead to choose
> >> between 't (x, y)' and 't (x INTEGER, y DATE)'.
> >>
> >> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >>> In the patch I just sent, probably not.
> >>> I will adjust it and add the corresponding test.
> >>>
> >>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>
> >>>> Can you use both together? Say
> >>>>
> >>>>  select columns
> >>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
> >> EXTEND
> >>>> (foo INTEGER)
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >>>>>
> >>>>> I took a stab at adding the TableFunction syntax without table(...)
> in
> >>>>> Calcite.
> >>>>> I have verified that both the table function and extend (with or
> >> without
> >>>>> keyword) work
> >>>>>
> >>>>
> >>
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >>>>>
> >>>>> These work:
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
> >> fieldDelimiter
> >>>> =>
> >>>>> '|')
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> >>>>> fieldDelimiter => '|'))
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`('JSON')
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >>>>>
> >>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
> >> we
> >>>>>> should constrain ourselves to having a union of the two syntaxes.
> >>>>>>
> >>>>>>
> >>>>>> That being said, I think we could make these work together... maybe.
> >>>>>>
> >>>>>> Remove the EXTENDS without keyword syntax from the grammar.
> >>>>>>
> >>>>>> Create a new sub block in the table block that requires no keyword.
> >>>> There
> >>>>>> would be two paths (and would probably require some lookahead)
> >>>>>>
> >>>>>> option 1> unnamed parameters (1,2,3)
> >>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
> >>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >>>>>> golfHandicap INTEGER)
> >>>>>>
> >>>>>> Then we create a table function with options 1 & 2, an EXTENDS
> clause
> >>>> for
> >>>>>> option 3.
> >>>>>>
> >>>>>> Best of both worlds?
> >>>>>>
> >>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
> jamestaylor@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Phoenix already supports columns at read-time using the syntax
> >> without
> >>>>>> the
> >>>>>>> EXTENDS keyword as Julian indicated:
> >>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>  WHERE goldHandicap < 10;
> >>>>>>>
> >>>>>>> Changing this by requiring the EXTENDS keyword would create a
> >> backward
> >>>>>>> compatibility problem.
> >>>>>>>
> >>>>>>> I think it'd be good if both of these extensions worked in Drill &
> >>>>>> Phoenix
> >>>>>>> given our Drillix initiative.
> >>>>>>>
> >>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> My proposal was an a or b using the freemarker template in the
> >>>> grammar,
> >>>>>>>> not something later.
> >>>>>>>>
> >>>>>>>> Actually, put another way: we may want to consider stating that we
> >>>> only
> >>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
> >>>> should
> >>>>>>> be
> >>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
> >> (the
> >>>>>>> same
> >>>>>>>> way we plug in external things in Drill).
> >>>>>>>>
> >>>>>>>> Trying to get every project to agree on extensions seems like it
> >> may
> >>>> be
> >>>>>>>> hard.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Jacques Nadeau
> >>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>
> >>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> I can see why Jacques wants this syntax.
> >>>>>>>>>
> >>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
> be
> >>>>>>>>> predictable. Any variation should happen at validation time, or
> >>>> later.
> >>>>>>>>>
> >>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
> >> avoiding
> >>>> a
> >>>>>>>>> tough design discussion.
> >>>>>>>>>
> >>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
> and
> >>>>>> they
> >>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
> >> Phoenix
> >>>>>> (by
> >>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
> >> SQL
> >>>>>>> syntax
> >>>>>>>>> should be.
> >>>>>>>>>
> >>>>>>>>> Julian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Looking at those two examples I agree with Jacques. The first
> >>>>>> appears
> >>>>>>>>> more
> >>>>>>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
> >> jacques@dremio.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
> >> we
> >>>>>>> could
> >>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
> >> seems
> >>>>>>>>>>> reasonable to support the table functions without the TABLE
> >> syntax.
> >>>>>>>>>>>
> >>>>>>>>>>> I for one definitely think the TABLE syntax is much more
> >> confusing
> >>>>>> to
> >>>>>>>>> use,
> >>>>>>>>>>> especially in the example that we're looking to support, such
> >> as:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>> fieldDelimiter
> >>>>>>> =>
> >>>>>>>>>>> '|', skipFirstRow => true)
> >>>>>>>>>>>
> >>>>>>>>>>> This seems much clearer than:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>>>>> fieldDelimiter
> >>>>>>>>>>> => '|', skipFirstRow => true))
> >>>>>>>>>>>
> >>>>>>>>>>> It also looks much more like a hint to the table (which is our
> >>>>>> goal).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for doing the legwork and finding what the other
> vendors
> >>>>>> do.
> >>>>>>>>> It is
> >>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>>>>>> standard
> >>>>>>>>> an
> >>>>>>>>>>>> make the TABLE keyword optional.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >>>>>> clash
> >>>>>>>>> with
> >>>>>>>>>>>> one of our own (few) extensions. In
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
> the
> >>>>>>>>> EXTENDS
> >>>>>>>>>>>> clause. You can write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
> >> Emp
> >>>>>>> table
> >>>>>>>>>>> but
> >>>>>>>>>>>> you would like to use them in this particular query. We chose
> >> to
> >>>>>>> make
> >>>>>>>>> the
> >>>>>>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> That is uncomfortably close to
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>>>>>
> >>>>>>>>>>>> so we would require
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>>>>>
> >>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
> >>>>>> forms
> >>>>>>>>> like
> >>>>>>>>>>>> this:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>>>>>
> >>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>>>>>> should
> >>>>>>>>>>> also
> >>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
> >>>>>> there's
> >>>>>>> no
> >>>>>>>>>>>>> consensus about this.
> >>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
> >> calling
> >>>>>>> table
> >>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
> >> DB2
> >>>>>>> [4]
> >>>>>>>>>>>>> expect it.
> >>>>>>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >>>>>> generation
> >>>>>>> to
> >>>>>>>>>>>> have
> >>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
> >> MySQL
> >>>>>>> like
> >>>>>>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>>>>>> [2]
> >>>>>>>>>>>
> >>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>>>>>> [3]
> >>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>>>>>> [4]
> >>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>>>>>> [5]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
> >> function
> >>>>>>>>>>>>> overloading:
> https://github.com/apache/calcite/pull/166/files
> >>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
> >>>>>> called
> >>>>>>>>> more
> >>>>>>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
> >> want
> >>>>>>> to
> >>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
> jhyde@apache.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
> julien@dremio.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >>>>>> Calcite
> >>>>>>>>> when
> >>>>>>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes; see below.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
> >> now
> >>>>>>> just
> >>>>>>>>>>>> being
> >>>>>>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
> >> are
> >>>>>>> the
> >>>>>>>>>>> way
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> go.
> >>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
> >> syntax
> >>>>>>>>>>>>>> necessary?*
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Consider:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
> >> similar.
> >>>>>>> Also,
> >>>>>>>>>>> if
> >>>>>>>>>>>> f
> >>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
> >>>>>> this?:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from f
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>>>>>> standards
> >>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
> >> disambiguate.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
> >> used
> >>>>>>> in
> >>>>>>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
> it.
> >>>>>>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
> we
> >>>>>> get
> >>>>>>> a
> >>>>>>>>>>> NPE
> >>>>>>>>>>>>>> later on.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
> >> a
> >>>>>>> JIRA
> >>>>>>>>>>> case
> >>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
> named
> >>>>>>>>>>> arguments.
> >>>>>>>>>>>>>> (It probably applies to all functions, not just table
> >>>>>> functions.)
> >>>>>>>>> The
> >>>>>>>>>>>> fix
> >>>>>>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> *Jim Scott*
> >>>>>>>>>> Director, Enterprise Strategy & Architecture
> >>>>>>>>>> +1 (347) 746-9281
> >>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>>>>>
> >>>>>>>>>> <http://www.mapr.com/>
> >>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>>>>>
> >>>>>>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Julien
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Julien
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

You’re hitting the grammar ambiguity I expected.

I think that base Calcite should require the full verbose syntax: the TABLE keyword for table functions and the EXTEND keyword for extends clauses. Then Drill can override to make TABLE optional, and Phoenix can override to make EXTEND optional.

Are you changing the parser in your forked copy of Calcite, or are you changing Drill’s extensions to that parser?

If the former, you (or I) should add extension points to Calcite’s parser make the TABLE keyword optional and to make the EXTEND keyword optional. No project should enable both extension points — otherwise they’ll end up with an ambiguous grammar. If you agree create a Calcite JIRA case for this.

Julian

 
> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Hi,
> I've been trying to enable this but it looks like in the current grammar
> (before my change) you can not use table functions and EXTEND together.
> That's because they are on difference branches of an | in the grammar.
> So I would suggest that we treat those as two separate improvement in two
> different pull requests:
> - not require table(...) to call table functions
> - allow using table functions and extend together.
> Does it make sense?
> Julien
> 
> 
> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> To be clear, it should be possible to use a table function with all of
>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>> aliases, TABLESAMPLE.
>> 
>> I'm surprised that the parser didn't need more lookahead to choose
>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>> 
>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> In the patch I just sent, probably not.
>>> I will adjust it and add the corresponding test.
>>> 
>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> Can you use both together? Say
>>>> 
>>>>  select columns
>>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>> EXTEND
>>>> (foo INTEGER)
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> 
>>>>> I took a stab at adding the TableFunction syntax without table(...) in
>>>>> Calcite.
>>>>> I have verified that both the table function and extend (with or
>> without
>>>>> keyword) work
>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>> 
>>>>> These work:
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>> fieldDelimiter
>>>> =>
>>>>> '|')
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>> fieldDelimiter => '|'))
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>> 
>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>> 
>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>> we
>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>> 
>>>>>> 
>>>>>> That being said, I think we could make these work together... maybe.
>>>>>> 
>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>> 
>>>>>> Create a new sub block in the table block that requires no keyword.
>>>> There
>>>>>> would be two paths (and would probably require some lookahead)
>>>>>> 
>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>> golfHandicap INTEGER)
>>>>>> 
>>>>>> Then we create a table function with options 1 & 2, an EXTENDS clause
>>>> for
>>>>>> option 3.
>>>>>> 
>>>>>> Best of both worlds?
>>>>>> 
>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Phoenix already supports columns at read-time using the syntax
>> without
>>>>>> the
>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>  WHERE goldHandicap < 10;
>>>>>>> 
>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>> backward
>>>>>>> compatibility problem.
>>>>>>> 
>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>> Phoenix
>>>>>>> given our Drillix initiative.
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>> grammar,
>>>>>>>> not something later.
>>>>>>>> 
>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>> only
>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>> should
>>>>>>> be
>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>> (the
>>>>>>> same
>>>>>>>> way we plug in external things in Drill).
>>>>>>>> 
>>>>>>>> Trying to get every project to agree on extensions seems like it
>> may
>>>> be
>>>>>>>> hard.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jacques Nadeau
>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>> 
>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>> later.
>>>>>>>>> 
>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>> avoiding
>>>> a
>>>>>>>>> tough design discussion.
>>>>>>>>> 
>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>>>>>> they
>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>> Phoenix
>>>>>> (by
>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>> SQL
>>>>>>> syntax
>>>>>>>>> should be.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>> appears
>>>>>>>>> more
>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>> jacques@dremio.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>> we
>>>>>>> could
>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>> seems
>>>>>>>>>>> reasonable to support the table functions without the TABLE
>> syntax.
>>>>>>>>>>> 
>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>> confusing
>>>>>> to
>>>>>>>>> use,
>>>>>>>>>>> especially in the example that we're looking to support, such
>> as:
>>>>>>>>>>> 
>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>> fieldDelimiter
>>>>>>> =>
>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>> 
>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>> 
>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>> fieldDelimiter
>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>> 
>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>> goal).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>>>>>> do.
>>>>>>>>> It is
>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>> standard
>>>>>>>>> an
>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>> 
>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>> clash
>>>>>>>>> with
>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>>>>>> EXTENDS
>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>> Emp
>>>>>>> table
>>>>>>>>>>> but
>>>>>>>>>>>> you would like to use them in this particular query. We chose
>> to
>>>>>>> make
>>>>>>>>> the
>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>> 
>>>>>>>>>>>> so we would require
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>> 
>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>> forms
>>>>>>>>> like
>>>>>>>>>>>> this:
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>> 
>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>> should
>>>>>>>>>>> also
>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>> 
>>>>>>>>>>>> Julian
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>> there's
>>>>>>> no
>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>> calling
>>>>>>> table
>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>> DB2
>>>>>>> [4]
>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>> generation
>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>> MySQL
>>>>>>> like
>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>> [3]
>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>> [4]
>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>> [5]
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>> function
>>>>>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>> called
>>>>>>>>> more
>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>> want
>>>>>>> to
>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Julien
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>> Calcite
>>>>>>>>> when
>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>> now
>>>>>>> just
>>>>>>>>>>>> being
>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>> are
>>>>>>> the
>>>>>>>>>>> way
>>>>>>>>>>>> to
>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>> syntax
>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>> similar.
>>>>>>> Also,
>>>>>>>>>>> if
>>>>>>>>>>>> f
>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>> this?:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>> standards
>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>> disambiguate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>> used
>>>>>>> in
>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>>>>>> get
>>>>>>> a
>>>>>>>>>>> NPE
>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>> a
>>>>>>> JIRA
>>>>>>>>>>> case
>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>>>>>> arguments.
>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>> functions.)
>>>>>>>>> The
>>>>>>>>>>>> fix
>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Julien
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> *Jim Scott*
>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>> 
>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>> 
>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

You’re hitting the grammar ambiguity I expected.

I think that base Calcite should require the full verbose syntax: the TABLE keyword for table functions and the EXTEND keyword for extends clauses. Then Drill can override to make TABLE optional, and Phoenix can override to make EXTEND optional.

Are you changing the parser in your forked copy of Calcite, or are you changing Drill’s extensions to that parser?

If the former, you (or I) should add extension points to Calcite’s parser make the TABLE keyword optional and to make the EXTEND keyword optional. No project should enable both extension points — otherwise they’ll end up with an ambiguous grammar. If you agree create a Calcite JIRA case for this.

Julian

 
> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Hi,
> I've been trying to enable this but it looks like in the current grammar
> (before my change) you can not use table functions and EXTEND together.
> That's because they are on difference branches of an | in the grammar.
> So I would suggest that we treat those as two separate improvement in two
> different pull requests:
> - not require table(...) to call table functions
> - allow using table functions and extend together.
> Does it make sense?
> Julien
> 
> 
> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> To be clear, it should be possible to use a table function with all of
>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>> aliases, TABLESAMPLE.
>> 
>> I'm surprised that the parser didn't need more lookahead to choose
>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>> 
>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> In the patch I just sent, probably not.
>>> I will adjust it and add the corresponding test.
>>> 
>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> Can you use both together? Say
>>>> 
>>>>  select columns
>>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>> EXTEND
>>>> (foo INTEGER)
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> 
>>>>> I took a stab at adding the TableFunction syntax without table(...) in
>>>>> Calcite.
>>>>> I have verified that both the table function and extend (with or
>> without
>>>>> keyword) work
>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>> 
>>>>> These work:
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>> fieldDelimiter
>>>> =>
>>>>> '|')
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>> fieldDelimiter => '|'))
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>> 
>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>> 
>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>> we
>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>> 
>>>>>> 
>>>>>> That being said, I think we could make these work together... maybe.
>>>>>> 
>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>> 
>>>>>> Create a new sub block in the table block that requires no keyword.
>>>> There
>>>>>> would be two paths (and would probably require some lookahead)
>>>>>> 
>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>> golfHandicap INTEGER)
>>>>>> 
>>>>>> Then we create a table function with options 1 & 2, an EXTENDS clause
>>>> for
>>>>>> option 3.
>>>>>> 
>>>>>> Best of both worlds?
>>>>>> 
>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Phoenix already supports columns at read-time using the syntax
>> without
>>>>>> the
>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>  WHERE goldHandicap < 10;
>>>>>>> 
>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>> backward
>>>>>>> compatibility problem.
>>>>>>> 
>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>> Phoenix
>>>>>>> given our Drillix initiative.
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>> grammar,
>>>>>>>> not something later.
>>>>>>>> 
>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>> only
>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>> should
>>>>>>> be
>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>> (the
>>>>>>> same
>>>>>>>> way we plug in external things in Drill).
>>>>>>>> 
>>>>>>>> Trying to get every project to agree on extensions seems like it
>> may
>>>> be
>>>>>>>> hard.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jacques Nadeau
>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>> 
>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>> later.
>>>>>>>>> 
>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>> avoiding
>>>> a
>>>>>>>>> tough design discussion.
>>>>>>>>> 
>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>>>>>> they
>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>> Phoenix
>>>>>> (by
>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>> SQL
>>>>>>> syntax
>>>>>>>>> should be.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>> appears
>>>>>>>>> more
>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>> jacques@dremio.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>> we
>>>>>>> could
>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>> seems
>>>>>>>>>>> reasonable to support the table functions without the TABLE
>> syntax.
>>>>>>>>>>> 
>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>> confusing
>>>>>> to
>>>>>>>>> use,
>>>>>>>>>>> especially in the example that we're looking to support, such
>> as:
>>>>>>>>>>> 
>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>> fieldDelimiter
>>>>>>> =>
>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>> 
>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>> 
>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>> fieldDelimiter
>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>> 
>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>> goal).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>>>>>> do.
>>>>>>>>> It is
>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>> standard
>>>>>>>>> an
>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>> 
>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>> clash
>>>>>>>>> with
>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>>>>>> EXTENDS
>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>> Emp
>>>>>>> table
>>>>>>>>>>> but
>>>>>>>>>>>> you would like to use them in this particular query. We chose
>> to
>>>>>>> make
>>>>>>>>> the
>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>> 
>>>>>>>>>>>> so we would require
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>> 
>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>> forms
>>>>>>>>> like
>>>>>>>>>>>> this:
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>> 
>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>> should
>>>>>>>>>>> also
>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>> 
>>>>>>>>>>>> Julian
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>> there's
>>>>>>> no
>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>> calling
>>>>>>> table
>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>> DB2
>>>>>>> [4]
>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>> generation
>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>> MySQL
>>>>>>> like
>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>> [3]
>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>> [4]
>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>> [5]
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>> function
>>>>>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>> called
>>>>>>>>> more
>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>> want
>>>>>>> to
>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Julien
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>> Calcite
>>>>>>>>> when
>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>> now
>>>>>>> just
>>>>>>>>>>>> being
>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>> are
>>>>>>> the
>>>>>>>>>>> way
>>>>>>>>>>>> to
>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>> syntax
>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>> similar.
>>>>>>> Also,
>>>>>>>>>>> if
>>>>>>>>>>>> f
>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>> this?:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>> standards
>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>> disambiguate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>> used
>>>>>>> in
>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>>>>>> get
>>>>>>> a
>>>>>>>>>>> NPE
>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>> a
>>>>>>> JIRA
>>>>>>>>>>> case
>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>>>>>> arguments.
>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>> functions.)
>>>>>>>>> The
>>>>>>>>>>>> fix
>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Julien
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> *Jim Scott*
>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>> 
>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>> 
>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Hi,
I've been trying to enable this but it looks like in the current grammar
(before my change) you can not use table functions and EXTEND together.
That's because they are on difference branches of an | in the grammar.
So I would suggest that we treat those as two separate improvement in two
different pull requests:
 - not require table(...) to call table functions
 - allow using table functions and extend together.
Does it make sense?
Julien


On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:

> To be clear, it should be possible to use a table function with all of
> the options -- EXTENDS clause, OVER clause, AS with alias and column
> aliases, TABLESAMPLE.
>
> I'm surprised that the parser didn't need more lookahead to choose
> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>
> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > In the patch I just sent, probably not.
> > I will adjust it and add the corresponding test.
> >
> > On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> Can you use both together? Say
> >>
> >>   select columns
> >>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
> EXTEND
> >> (foo INTEGER)
> >>
> >> Julian
> >>
> >>
> >>
> >> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >> >
> >> > I took a stab at adding the TableFunction syntax without table(...) in
> >> > Calcite.
> >> > I have verified that both the table function and extend (with or
> without
> >> > keyword) work
> >> >
> >>
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >> >
> >> > These work:
> >> >
> >> > select columns from dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter
> >> =>
> >> > '|')
> >> >
> >> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> >> > fieldDelimiter => '|'))
> >> >
> >> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >> >
> >> > select columns from dfs.`/path/to/myfile`('JSON')
> >> >
> >> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >> >
> >> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >> >
> >> >> Drill does implicitly what Phoenix does explicitly so I don't think
> we
> >> >> should constrain ourselves to having a union of the two syntaxes.
> >> >>
> >> >>
> >> >> That being said, I think we could make these work together... maybe.
> >> >>
> >> >> Remove the EXTENDS without keyword syntax from the grammar.
> >> >>
> >> >> Create a new sub block in the table block that requires no keyword.
> >> There
> >> >> would be two paths (and would probably require some lookahead)
> >> >>
> >> >> option 1> unnamed parameters (1,2,3)
> >> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> >> golfHandicap INTEGER)
> >> >>
> >> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> >> for
> >> >> option 3.
> >> >>
> >> >> Best of both worlds?
> >> >>
> >> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org
> >
> >> >> wrote:
> >> >>
> >> >>> Phoenix already supports columns at read-time using the syntax
> without
> >> >> the
> >> >>> EXTENDS keyword as Julian indicated:
> >> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
> INTEGER)
> >> >>>   WHERE goldHandicap < 10;
> >> >>>
> >> >>> Changing this by requiring the EXTENDS keyword would create a
> backward
> >> >>> compatibility problem.
> >> >>>
> >> >>> I think it'd be good if both of these extensions worked in Drill &
> >> >> Phoenix
> >> >>> given our Drillix initiative.
> >> >>>
> >> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> >> >> wrote:
> >> >>>
> >> >>>> My proposal was an a or b using the freemarker template in the
> >> grammar,
> >> >>>> not something later.
> >> >>>>
> >> >>>> Actually, put another way: we may want to consider stating that we
> >> only
> >> >>>> incorporate SQL standards in our primary grammar. Any extensions
> >> should
> >> >>> be
> >> >>>> optional grammar. We could simply have grammar plugins in Calcite
> (the
> >> >>> same
> >> >>>> way we plug in external things in Drill).
> >> >>>>
> >> >>>> Trying to get every project to agree on extensions seems like it
> may
> >> be
> >> >>>> hard.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Jacques Nadeau
> >> >>>> CTO and Co-Founder, Dremio
> >> >>>>
> >> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >> >>>>
> >> >>>>> I can see why Jacques wants this syntax.
> >> >>>>>
> >> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
> >> >>>>> predictable. Any variation should happen at validation time, or
> >> later.
> >> >>>>>
> >> >>>>> Also, we shouldn’t add configuration parameters as a way of
> avoiding
> >> a
> >> >>>>> tough design discussion.
> >> >>>>>
> >> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> >> they
> >> >>>>> are both applicable to Drill and Phoenix. I think Drill and
> Phoenix
> >> >> (by
> >> >>>>> which I mean Jacques and James, I guess) need to agree what the
> SQL
> >> >>> syntax
> >> >>>>> should be.
> >> >>>>>
> >> >>>>> Julian
> >> >>>>>
> >> >>>>>
> >> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
> wrote:
> >> >>>>>>
> >> >>>>>> Looking at those two examples I agree with Jacques. The first
> >> >> appears
> >> >>>>> more
> >> >>>>>> like a hint from the syntactic sugar point of view.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
> jacques@dremio.com
> >> >>>
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that
> we
> >> >>> could
> >> >>>>>>> have a switch. Given that SQL Server and Postgres support it
> seems
> >> >>>>>>> reasonable to support the table functions without the TABLE
> syntax.
> >> >>>>>>>
> >> >>>>>>> I for one definitely think the TABLE syntax is much more
> confusing
> >> >> to
> >> >>>>> use,
> >> >>>>>>> especially in the example that we're looking to support, such
> as:
> >> >>>>>>>
> >> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >> >> fieldDelimiter
> >> >>> =>
> >> >>>>>>> '|', skipFirstRow => true)
> >> >>>>>>>
> >> >>>>>>> This seems much clearer than:
> >> >>>>>>>
> >> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >> >>>>> fieldDelimiter
> >> >>>>>>> => '|', skipFirstRow => true))
> >> >>>>>>>
> >> >>>>>>> It also looks much more like a hint to the table (which is our
> >> >> goal).
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> Jacques Nadeau
> >> >>>>>>> CTO and Co-Founder, Dremio
> >> >>>>>>>
> >> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >> >>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
> >> >> do.
> >> >>>>> It is
> >> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >> >>> standard
> >> >>>>> an
> >> >>>>>>>> make the TABLE keyword optional.
> >> >>>>>>>>
> >> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >> >> clash
> >> >>>>> with
> >> >>>>>>>> one of our own (few) extensions. In
> >> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >> >>>>> EXTENDS
> >> >>>>>>>> clause. You can write
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
> INTEGER)
> >> >>>>>>>> WHERE goldHandicap < 10;
> >> >>>>>>>>
> >> >>>>>>>> to tell Calcite that there are two undeclared columns in the
> Emp
> >> >>> table
> >> >>>>>>> but
> >> >>>>>>>> you would like to use them in this particular query. We chose
> to
> >> >>> make
> >> >>>>> the
> >> >>>>>>>> EXTEND keyword optional, so you could instead write
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>>>>>> WHERE goldHandicap < 10;
> >> >>>>>>>>
> >> >>>>>>>> That is uncomfortably close to
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >> >>>>>>>>
> >> >>>>>>>> so we would require
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >> >>>>>>>>
> >> >>>>>>>> if EmpFunction was a table-function. You could combine the two
> >> >> forms
> >> >>>>> like
> >> >>>>>>>> this:
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >> >>>>>>>> (anotherAttribute INTEGER);
> >> >>>>>>>>
> >> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >> >>> should
> >> >>>>>>> also
> >> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >> >>>>>>>>
> >> >>>>>>>> Julian
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> >> >>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>> - Table function syntax: I did a quick search and it seems
> >> >> there's
> >> >>> no
> >> >>>>>>>>> consensus about this.
> >> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
> calling
> >> >>> table
> >> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
> DB2
> >> >>> [4]
> >> >>>>>>>>> expect it.
> >> >>>>>>>>> MySQL does not have table functions [5]
> >> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >> >> generation
> >> >>> to
> >> >>>>>>>> have
> >> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
> MySQL
> >> >>> like
> >> >>>>>>>>> syntax (back ticks for identifiers etc)
> >> >>>>>>>>>
> >> >>>>>>>>> [1]
> >> >>>>>>>
> >> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >> >>>>>>>>> [2]
> >> >>>>>>>
> >> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >> >>>>>>>>> [3]
> >> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >> >>>>>>>>> [4]
> >> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >> >>>>>>>>> [5]
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>
> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >> >>>>>>>>>
> >> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
> function
> >> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
> >> >>>>>>>>> But that seems too easy to be true. Possibly this method is
> >> >> called
> >> >>>>> more
> >> >>>>>>>>> than once (before and after the function has been resolved?)
> >> >>>>>>>>>
> >> >>>>>>>>> FYI this would happen only when using named parameter. We do
> want
> >> >>> to
> >> >>>>>>>>> overload in this case, which is why I'm looking into it.
> >> >>>>>>>>>
> >> >>>>>>>>> I'll fill a JIRA for my other branch
> >> >>>>>>>>>
> >> >>>>>>>>> Julien
> >> >>>>>>>>>
> >> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org
> >
> >> >>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com
> >
> >> >>>>> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >> >> Calcite
> >> >>>>> when
> >> >>>>>>>>>> there's more than 1 function with the same name.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Yes; see below.
> >> >>>>>>>>>>
> >> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
> now
> >> >>> just
> >> >>>>>>>> being
> >> >>>>>>>>>> able to specify the delimiter for csv files.
> >> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
> are
> >> >>> the
> >> >>>>>>> way
> >> >>>>>>>> to
> >> >>>>>>>>>> go.
> >> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
> syntax
> >> >>>>>>>>>> necessary?*
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Consider:
> >> >>>>>>>>>>
> >> >>>>>>>>>> select * from myTable as f(x, y)
> >> >>>>>>>>>> select * from myTable f(x, y)
> >> >>>>>>>>>> select * from myFunction(x, y)
> >> >>>>>>>>>>
> >> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
> similar.
> >> >>> Also,
> >> >>>>>>> if
> >> >>>>>>>> f
> >> >>>>>>>>>> is a function with zero arguments, could you invoke it like
> >> >> this?:
> >> >>>>>>>>>>
> >> >>>>>>>>>> select * from f
> >> >>>>>>>>>>
> >> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >> >>> standards
> >> >>>>>>>>>> people in their wisdom decided to add a keyword to
> disambiguate.
> >> >>>>>>>>>>
> >> >>>>>>>>>> I had to fix some things in Calcite to enable this:
> >> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
> used
> >> >>> in
> >> >>>>>>>>>> Calcite for the Maze example.
> >> >>>>>>>>>> Which is why some hooks were missing.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Can you log a jira case to track this bug?
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >> >>>>>>>>>> Here is a test that reproduces the problem:
> >> >>>>>>>>>> https://github.com/apache/calcite/pull/166
> >> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
> >> >> get
> >> >>> a
> >> >>>>>>> NPE
> >> >>>>>>>>>> later on.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
> a
> >> >>> JIRA
> >> >>>>>>> case
> >> >>>>>>>>>> on resolution of overloaded functions when invoked with named
> >> >>>>>>> arguments.
> >> >>>>>>>>>> (It probably applies to all functions, not just table
> >> >> functions.)
> >> >>>>> The
> >> >>>>>>>> fix
> >> >>>>>>>>>> will take a while (if you wait for me to write it).
> >> >>>>>>>>>>
> >> >>>>>>>>>> For now please tell your users not to overload. :)
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Julian
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Julien
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> *Jim Scott*
> >> >>>>>> Director, Enterprise Strategy & Architecture
> >> >>>>>> +1 (347) 746-9281
> >> >>>>>> @kingmesal <https://twitter.com/kingmesal>
> >> >>>>>>
> >> >>>>>> <http://www.mapr.com/>
> >> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >> >>>>>>
> >> >>>>>> Now Available - Free Hadoop On-Demand Training
> >> >>>>>> <
> >> >>>>>
> >> >>>
> >> >>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Julien
> >>
> >>
> >
> >
> > --
> > Julien
>



-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

To be clear, it should be possible to use a table function with all of
the options -- EXTENDS clause, OVER clause, AS with alias and column
aliases, TABLESAMPLE.

I'm surprised that the parser didn't need more lookahead to choose
between 't (x, y)' and 't (x INTEGER, y DATE)'.

On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
> In the patch I just sent, probably not.
> I will adjust it and add the corresponding test.
>
> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> Can you use both together? Say
>>
>>   select columns
>>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
>> (foo INTEGER)
>>
>> Julian
>>
>>
>>
>> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
>> >
>> > I took a stab at adding the TableFunction syntax without table(...) in
>> > Calcite.
>> > I have verified that both the table function and extend (with or without
>> > keyword) work
>> >
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>> >
>> > These work:
>> >
>> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
>> =>
>> > '|')
>> >
>> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>> > fieldDelimiter => '|'))
>> >
>> > select columns from table(dfs.`/path/to/myfile`('JSON'))
>> >
>> > select columns from dfs.`/path/to/myfile`('JSON')
>> >
>> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
>> >
>> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>> >
>> >> Drill does implicitly what Phoenix does explicitly so I don't think we
>> >> should constrain ourselves to having a union of the two syntaxes.
>> >>
>> >>
>> >> That being said, I think we could make these work together... maybe.
>> >>
>> >> Remove the EXTENDS without keyword syntax from the grammar.
>> >>
>> >> Create a new sub block in the table block that requires no keyword.
>> There
>> >> would be two paths (and would probably require some lookahead)
>> >>
>> >> option 1> unnamed parameters (1,2,3)
>> >> option 2> named parameters (a => 1, b=>2, c=> 3)
>> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> >> golfHandicap INTEGER)
>> >>
>> >> Then we create a table function with options 1 & 2, an EXTENDS clause
>> for
>> >> option 3.
>> >>
>> >> Best of both worlds?
>> >>
>> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
>> >> wrote:
>> >>
>> >>> Phoenix already supports columns at read-time using the syntax without
>> >> the
>> >>> EXTENDS keyword as Julian indicated:
>> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>   WHERE goldHandicap < 10;
>> >>>
>> >>> Changing this by requiring the EXTENDS keyword would create a backward
>> >>> compatibility problem.
>> >>>
>> >>> I think it'd be good if both of these extensions worked in Drill &
>> >> Phoenix
>> >>> given our Drillix initiative.
>> >>>
>> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>> >> wrote:
>> >>>
>> >>>> My proposal was an a or b using the freemarker template in the
>> grammar,
>> >>>> not something later.
>> >>>>
>> >>>> Actually, put another way: we may want to consider stating that we
>> only
>> >>>> incorporate SQL standards in our primary grammar. Any extensions
>> should
>> >>> be
>> >>>> optional grammar. We could simply have grammar plugins in Calcite (the
>> >>> same
>> >>>> way we plug in external things in Drill).
>> >>>>
>> >>>> Trying to get every project to agree on extensions seems like it may
>> be
>> >>>> hard.
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Jacques Nadeau
>> >>>> CTO and Co-Founder, Dremio
>> >>>>
>> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>> >>>>
>> >>>>> I can see why Jacques wants this syntax.
>> >>>>>
>> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>> >>>>> predictable. Any variation should happen at validation time, or
>> later.
>> >>>>>
>> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
>> a
>> >>>>> tough design discussion.
>> >>>>>
>> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>> >> they
>> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
>> >> (by
>> >>>>> which I mean Jacques and James, I guess) need to agree what the SQL
>> >>> syntax
>> >>>>> should be.
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>> >>>>>>
>> >>>>>> Looking at those two examples I agree with Jacques. The first
>> >> appears
>> >>>>> more
>> >>>>>> like a hint from the syntactic sugar point of view.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>> >>>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
>> >>> could
>> >>>>>>> have a switch. Given that SQL Server and Postgres support it seems
>> >>>>>>> reasonable to support the table functions without the TABLE syntax.
>> >>>>>>>
>> >>>>>>> I for one definitely think the TABLE syntax is much more confusing
>> >> to
>> >>>>> use,
>> >>>>>>> especially in the example that we're looking to support, such as:
>> >>>>>>>
>> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> >> fieldDelimiter
>> >>> =>
>> >>>>>>> '|', skipFirstRow => true)
>> >>>>>>>
>> >>>>>>> This seems much clearer than:
>> >>>>>>>
>> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>> >>>>> fieldDelimiter
>> >>>>>>> => '|', skipFirstRow => true))
>> >>>>>>>
>> >>>>>>> It also looks much more like a hint to the table (which is our
>> >> goal).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Jacques Nadeau
>> >>>>>>> CTO and Co-Founder, Dremio
>> >>>>>>>
>> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
>> >> do.
>> >>>>> It is
>> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>> >>> standard
>> >>>>> an
>> >>>>>>>> make the TABLE keyword optional.
>> >>>>>>>>
>> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
>> >> clash
>> >>>>> with
>> >>>>>>>> one of our own (few) extensions. In
>> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>> >>>>> EXTENDS
>> >>>>>>>> clause. You can write
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>>>>>> WHERE goldHandicap < 10;
>> >>>>>>>>
>> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
>> >>> table
>> >>>>>>> but
>> >>>>>>>> you would like to use them in this particular query. We chose to
>> >>> make
>> >>>>> the
>> >>>>>>>> EXTEND keyword optional, so you could instead write
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>>>>>> WHERE goldHandicap < 10;
>> >>>>>>>>
>> >>>>>>>> That is uncomfortably close to
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>>>>>>
>> >>>>>>>> so we would require
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>>>>>>
>> >>>>>>>> if EmpFunction was a table-function. You could combine the two
>> >> forms
>> >>>>> like
>> >>>>>>>> this:
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>> >>>>>>>> (anotherAttribute INTEGER);
>> >>>>>>>>
>> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>> >>> should
>> >>>>>>> also
>> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>> >>>>>>>>
>> >>>>>>>> Julian
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>> >>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> - Table function syntax: I did a quick search and it seems
>> >> there's
>> >>> no
>> >>>>>>>>> consensus about this.
>> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
>> >>> table
>> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
>> >>> [4]
>> >>>>>>>>> expect it.
>> >>>>>>>>> MySQL does not have table functions [5]
>> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
>> >> generation
>> >>> to
>> >>>>>>>> have
>> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
>> >>> like
>> >>>>>>>>> syntax (back ticks for identifiers etc)
>> >>>>>>>>>
>> >>>>>>>>> [1]
>> >>>>>>>
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>>>>>>> [2]
>> >>>>>>>
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>>>>>>> [3]
>> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>>>>>>> [4]
>> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>>>>>>> [5]
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>>>>>>
>> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
>> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>>>>>>> But that seems too easy to be true. Possibly this method is
>> >> called
>> >>>>> more
>> >>>>>>>>> than once (before and after the function has been resolved?)
>> >>>>>>>>>
>> >>>>>>>>> FYI this would happen only when using named parameter. We do want
>> >>> to
>> >>>>>>>>> overload in this case, which is why I'm looking into it.
>> >>>>>>>>>
>> >>>>>>>>> I'll fill a JIRA for my other branch
>> >>>>>>>>>
>> >>>>>>>>> Julien
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>> >>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>> >> Calcite
>> >>>>> when
>> >>>>>>>>>> there's more than 1 function with the same name.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Yes; see below.
>> >>>>>>>>>>
>> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
>> >>> just
>> >>>>>>>> being
>> >>>>>>>>>> able to specify the delimiter for csv files.
>> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
>> >>> the
>> >>>>>>> way
>> >>>>>>>> to
>> >>>>>>>>>> go.
>> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>> >>>>>>>>>> necessary?*
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Consider:
>> >>>>>>>>>>
>> >>>>>>>>>> select * from myTable as f(x, y)
>> >>>>>>>>>> select * from myTable f(x, y)
>> >>>>>>>>>> select * from myFunction(x, y)
>> >>>>>>>>>>
>> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
>> >>> Also,
>> >>>>>>> if
>> >>>>>>>> f
>> >>>>>>>>>> is a function with zero arguments, could you invoke it like
>> >> this?:
>> >>>>>>>>>>
>> >>>>>>>>>> select * from f
>> >>>>>>>>>>
>> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>> >>> standards
>> >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
>> >>>>>>>>>>
>> >>>>>>>>>> I had to fix some things in Calcite to enable this:
>> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
>> >>> in
>> >>>>>>>>>> Calcite for the Maze example.
>> >>>>>>>>>> Which is why some hooks were missing.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Can you log a jira case to track this bug?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>> >>>>>>>>>> Here is a test that reproduces the problem:
>> >>>>>>>>>> https://github.com/apache/calcite/pull/166
>> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>> >> get
>> >>> a
>> >>>>>>> NPE
>> >>>>>>>>>> later on.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
>> >>> JIRA
>> >>>>>>> case
>> >>>>>>>>>> on resolution of overloaded functions when invoked with named
>> >>>>>>> arguments.
>> >>>>>>>>>> (It probably applies to all functions, not just table
>> >> functions.)
>> >>>>> The
>> >>>>>>>> fix
>> >>>>>>>>>> will take a while (if you wait for me to write it).
>> >>>>>>>>>>
>> >>>>>>>>>> For now please tell your users not to overload. :)
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Julian
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Julien
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Jim Scott*
>> >>>>>> Director, Enterprise Strategy & Architecture
>> >>>>>> +1 (347) 746-9281
>> >>>>>> @kingmesal <https://twitter.com/kingmesal>
>> >>>>>>
>> >>>>>> <http://www.mapr.com/>
>> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
>> >>>>>>
>> >>>>>> Now Available - Free Hadoop On-Demand Training
>> >>>>>> <
>> >>>>>
>> >>>
>> >>
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Julien
>>
>>
>
>
> --
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

In the patch I just sent, probably not.
I will adjust it and add the corresponding test.

On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:

> Can you use both together? Say
>
>   select columns
>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
> (foo INTEGER)
>
> Julian
>
>
>
> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > I took a stab at adding the TableFunction syntax without table(...) in
> > Calcite.
> > I have verified that both the table function and extend (with or without
> > keyword) work
> >
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >
> > These work:
> >
> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
> =>
> > '|')
> >
> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> > fieldDelimiter => '|'))
> >
> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >
> > select columns from dfs.`/path/to/myfile`('JSON')
> >
> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >
> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> >> Drill does implicitly what Phoenix does explicitly so I don't think we
> >> should constrain ourselves to having a union of the two syntaxes.
> >>
> >>
> >> That being said, I think we could make these work together... maybe.
> >>
> >> Remove the EXTENDS without keyword syntax from the grammar.
> >>
> >> Create a new sub block in the table block that requires no keyword.
> There
> >> would be two paths (and would probably require some lookahead)
> >>
> >> option 1> unnamed parameters (1,2,3)
> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> golfHandicap INTEGER)
> >>
> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> for
> >> option 3.
> >>
> >> Best of both worlds?
> >>
> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> >> wrote:
> >>
> >>> Phoenix already supports columns at read-time using the syntax without
> >> the
> >>> EXTENDS keyword as Julian indicated:
> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>   WHERE goldHandicap < 10;
> >>>
> >>> Changing this by requiring the EXTENDS keyword would create a backward
> >>> compatibility problem.
> >>>
> >>> I think it'd be good if both of these extensions worked in Drill &
> >> Phoenix
> >>> given our Drillix initiative.
> >>>
> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >>>
> >>>> My proposal was an a or b using the freemarker template in the
> grammar,
> >>>> not something later.
> >>>>
> >>>> Actually, put another way: we may want to consider stating that we
> only
> >>>> incorporate SQL standards in our primary grammar. Any extensions
> should
> >>> be
> >>>> optional grammar. We could simply have grammar plugins in Calcite (the
> >>> same
> >>>> way we plug in external things in Drill).
> >>>>
> >>>> Trying to get every project to agree on extensions seems like it may
> be
> >>>> hard.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jacques Nadeau
> >>>> CTO and Co-Founder, Dremio
> >>>>
> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>> I can see why Jacques wants this syntax.
> >>>>>
> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
> >>>>> predictable. Any variation should happen at validation time, or
> later.
> >>>>>
> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
> a
> >>>>> tough design discussion.
> >>>>>
> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> they
> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> >> (by
> >>>>> which I mean Jacques and James, I guess) need to agree what the SQL
> >>> syntax
> >>>>> should be.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >>>>>>
> >>>>>> Looking at those two examples I agree with Jacques. The first
> >> appears
> >>>>> more
> >>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
> >>> could
> >>>>>>> have a switch. Given that SQL Server and Postgres support it seems
> >>>>>>> reasonable to support the table functions without the TABLE syntax.
> >>>>>>>
> >>>>>>> I for one definitely think the TABLE syntax is much more confusing
> >> to
> >>>>> use,
> >>>>>>> especially in the example that we're looking to support, such as:
> >>>>>>>
> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >>> =>
> >>>>>>> '|', skipFirstRow => true)
> >>>>>>>
> >>>>>>> This seems much clearer than:
> >>>>>>>
> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>> fieldDelimiter
> >>>>>>> => '|', skipFirstRow => true))
> >>>>>>>
> >>>>>>> It also looks much more like a hint to the table (which is our
> >> goal).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jacques Nadeau
> >>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
> >> do.
> >>>>> It is
> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>> standard
> >>>>> an
> >>>>>>>> make the TABLE keyword optional.
> >>>>>>>>
> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >> clash
> >>>>> with
> >>>>>>>> one of our own (few) extensions. In
> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >>>>> EXTENDS
> >>>>>>>> clause. You can write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
> >>> table
> >>>>>>> but
> >>>>>>>> you would like to use them in this particular query. We chose to
> >>> make
> >>>>> the
> >>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> That is uncomfortably close to
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>
> >>>>>>>> so we would require
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>
> >>>>>>>> if EmpFunction was a table-function. You could combine the two
> >> forms
> >>>>> like
> >>>>>>>> this:
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>
> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>> should
> >>>>>>> also
> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>
> >>>>>>>> Julian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> - Table function syntax: I did a quick search and it seems
> >> there's
> >>> no
> >>>>>>>>> consensus about this.
> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> >>> table
> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
> >>> [4]
> >>>>>>>>> expect it.
> >>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >> generation
> >>> to
> >>>>>>>> have
> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> >>> like
> >>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>
> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>> [2]
> >>>>>>>
> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>> [3]
> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>> [4]
> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>> [5]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>
> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
> >>>>>>>>> But that seems too easy to be true. Possibly this method is
> >> called
> >>>>> more
> >>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>
> >>>>>>>>> FYI this would happen only when using named parameter. We do want
> >>> to
> >>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>
> >>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>
> >>>>>>>>> Julien
> >>>>>>>>>
> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >> Calcite
> >>>>> when
> >>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes; see below.
> >>>>>>>>>>
> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> >>> just
> >>>>>>>> being
> >>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
> >>> the
> >>>>>>> way
> >>>>>>>> to
> >>>>>>>>>> go.
> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >>>>>>>>>> necessary?*
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Consider:
> >>>>>>>>>>
> >>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>
> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> >>> Also,
> >>>>>>> if
> >>>>>>>> f
> >>>>>>>>>> is a function with zero arguments, could you invoke it like
> >> this?:
> >>>>>>>>>>
> >>>>>>>>>> select * from f
> >>>>>>>>>>
> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>> standards
> >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
> >>>>>>>>>>
> >>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> >>> in
> >>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
> >> get
> >>> a
> >>>>>>> NPE
> >>>>>>>>>> later on.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
> >>> JIRA
> >>>>>>> case
> >>>>>>>>>> on resolution of overloaded functions when invoked with named
> >>>>>>> arguments.
> >>>>>>>>>> (It probably applies to all functions, not just table
> >> functions.)
> >>>>> The
> >>>>>>>> fix
> >>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>
> >>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Julian
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Julien
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Jim Scott*
> >>>>>> Director, Enterprise Strategy & Architecture
> >>>>>> +1 (347) 746-9281
> >>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>
> >>>>>> <http://www.mapr.com/>
> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>
> >>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>> <
> >>>>>
> >>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

In the patch I just sent, probably not.
I will adjust it and add the corresponding test.

On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:

> Can you use both together? Say
>
>   select columns
>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
> (foo INTEGER)
>
> Julian
>
>
>
> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > I took a stab at adding the TableFunction syntax without table(...) in
> > Calcite.
> > I have verified that both the table function and extend (with or without
> > keyword) work
> >
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >
> > These work:
> >
> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
> =>
> > '|')
> >
> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> > fieldDelimiter => '|'))
> >
> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >
> > select columns from dfs.`/path/to/myfile`('JSON')
> >
> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >
> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> >> Drill does implicitly what Phoenix does explicitly so I don't think we
> >> should constrain ourselves to having a union of the two syntaxes.
> >>
> >>
> >> That being said, I think we could make these work together... maybe.
> >>
> >> Remove the EXTENDS without keyword syntax from the grammar.
> >>
> >> Create a new sub block in the table block that requires no keyword.
> There
> >> would be two paths (and would probably require some lookahead)
> >>
> >> option 1> unnamed parameters (1,2,3)
> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> golfHandicap INTEGER)
> >>
> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> for
> >> option 3.
> >>
> >> Best of both worlds?
> >>
> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> >> wrote:
> >>
> >>> Phoenix already supports columns at read-time using the syntax without
> >> the
> >>> EXTENDS keyword as Julian indicated:
> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>   WHERE goldHandicap < 10;
> >>>
> >>> Changing this by requiring the EXTENDS keyword would create a backward
> >>> compatibility problem.
> >>>
> >>> I think it'd be good if both of these extensions worked in Drill &
> >> Phoenix
> >>> given our Drillix initiative.
> >>>
> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >>>
> >>>> My proposal was an a or b using the freemarker template in the
> grammar,
> >>>> not something later.
> >>>>
> >>>> Actually, put another way: we may want to consider stating that we
> only
> >>>> incorporate SQL standards in our primary grammar. Any extensions
> should
> >>> be
> >>>> optional grammar. We could simply have grammar plugins in Calcite (the
> >>> same
> >>>> way we plug in external things in Drill).
> >>>>
> >>>> Trying to get every project to agree on extensions seems like it may
> be
> >>>> hard.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jacques Nadeau
> >>>> CTO and Co-Founder, Dremio
> >>>>
> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>> I can see why Jacques wants this syntax.
> >>>>>
> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
> >>>>> predictable. Any variation should happen at validation time, or
> later.
> >>>>>
> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
> a
> >>>>> tough design discussion.
> >>>>>
> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> they
> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> >> (by
> >>>>> which I mean Jacques and James, I guess) need to agree what the SQL
> >>> syntax
> >>>>> should be.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >>>>>>
> >>>>>> Looking at those two examples I agree with Jacques. The first
> >> appears
> >>>>> more
> >>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
> >>> could
> >>>>>>> have a switch. Given that SQL Server and Postgres support it seems
> >>>>>>> reasonable to support the table functions without the TABLE syntax.
> >>>>>>>
> >>>>>>> I for one definitely think the TABLE syntax is much more confusing
> >> to
> >>>>> use,
> >>>>>>> especially in the example that we're looking to support, such as:
> >>>>>>>
> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >>> =>
> >>>>>>> '|', skipFirstRow => true)
> >>>>>>>
> >>>>>>> This seems much clearer than:
> >>>>>>>
> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>> fieldDelimiter
> >>>>>>> => '|', skipFirstRow => true))
> >>>>>>>
> >>>>>>> It also looks much more like a hint to the table (which is our
> >> goal).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jacques Nadeau
> >>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
> >> do.
> >>>>> It is
> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>> standard
> >>>>> an
> >>>>>>>> make the TABLE keyword optional.
> >>>>>>>>
> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >> clash
> >>>>> with
> >>>>>>>> one of our own (few) extensions. In
> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >>>>> EXTENDS
> >>>>>>>> clause. You can write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
> >>> table
> >>>>>>> but
> >>>>>>>> you would like to use them in this particular query. We chose to
> >>> make
> >>>>> the
> >>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> That is uncomfortably close to
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>
> >>>>>>>> so we would require
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>
> >>>>>>>> if EmpFunction was a table-function. You could combine the two
> >> forms
> >>>>> like
> >>>>>>>> this:
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>
> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>> should
> >>>>>>> also
> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>
> >>>>>>>> Julian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> - Table function syntax: I did a quick search and it seems
> >> there's
> >>> no
> >>>>>>>>> consensus about this.
> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> >>> table
> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
> >>> [4]
> >>>>>>>>> expect it.
> >>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >> generation
> >>> to
> >>>>>>>> have
> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> >>> like
> >>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>
> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>> [2]
> >>>>>>>
> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>> [3]
> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>> [4]
> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>> [5]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>
> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
> >>>>>>>>> But that seems too easy to be true. Possibly this method is
> >> called
> >>>>> more
> >>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>
> >>>>>>>>> FYI this would happen only when using named parameter. We do want
> >>> to
> >>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>
> >>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>
> >>>>>>>>> Julien
> >>>>>>>>>
> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >> Calcite
> >>>>> when
> >>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes; see below.
> >>>>>>>>>>
> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> >>> just
> >>>>>>>> being
> >>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
> >>> the
> >>>>>>> way
> >>>>>>>> to
> >>>>>>>>>> go.
> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >>>>>>>>>> necessary?*
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Consider:
> >>>>>>>>>>
> >>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>
> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> >>> Also,
> >>>>>>> if
> >>>>>>>> f
> >>>>>>>>>> is a function with zero arguments, could you invoke it like
> >> this?:
> >>>>>>>>>>
> >>>>>>>>>> select * from f
> >>>>>>>>>>
> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>> standards
> >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
> >>>>>>>>>>
> >>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> >>> in
> >>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
> >> get
> >>> a
> >>>>>>> NPE
> >>>>>>>>>> later on.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
> >>> JIRA
> >>>>>>> case
> >>>>>>>>>> on resolution of overloaded functions when invoked with named
> >>>>>>> arguments.
> >>>>>>>>>> (It probably applies to all functions, not just table
> >> functions.)
> >>>>> The
> >>>>>>>> fix
> >>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>
> >>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Julian
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Julien
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Jim Scott*
> >>>>>> Director, Enterprise Strategy & Architecture
> >>>>>> +1 (347) 746-9281
> >>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>
> >>>>>> <http://www.mapr.com/>
> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>
> >>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>> <
> >>>>>
> >>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Can you use both together? Say

  select columns
  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND (foo INTEGER)

Julian



> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> I took a stab at adding the TableFunction syntax without table(...) in
> Calcite.
> I have verified that both the table function and extend (with or without
> keyword) work
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> 
> These work:
> 
> select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
> '|')
> 
> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter => '|'))
> 
> select columns from table(dfs.`/path/to/myfile`('JSON'))
> 
> select columns from dfs.`/path/to/myfile`('JSON')
> 
> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> 
> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
>> Drill does implicitly what Phoenix does explicitly so I don't think we
>> should constrain ourselves to having a union of the two syntaxes.
>> 
>> 
>> That being said, I think we could make these work together... maybe.
>> 
>> Remove the EXTENDS without keyword syntax from the grammar.
>> 
>> Create a new sub block in the table block that requires no keyword. There
>> would be two paths (and would probably require some lookahead)
>> 
>> option 1> unnamed parameters (1,2,3)
>> option 2> named parameters (a => 1, b=>2, c=> 3)
>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> golfHandicap INTEGER)
>> 
>> Then we create a table function with options 1 & 2, an EXTENDS clause for
>> option 3.
>> 
>> Best of both worlds?
>> 
>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
>> wrote:
>> 
>>> Phoenix already supports columns at read-time using the syntax without
>> the
>>> EXTENDS keyword as Julian indicated:
>>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>   WHERE goldHandicap < 10;
>>> 
>>> Changing this by requiring the EXTENDS keyword would create a backward
>>> compatibility problem.
>>> 
>>> I think it'd be good if both of these extensions worked in Drill &
>> Phoenix
>>> given our Drillix initiative.
>>> 
>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>> 
>>>> My proposal was an a or b using the freemarker template in the grammar,
>>>> not something later.
>>>> 
>>>> Actually, put another way: we may want to consider stating that we only
>>>> incorporate SQL standards in our primary grammar. Any extensions should
>>> be
>>>> optional grammar. We could simply have grammar plugins in Calcite (the
>>> same
>>>> way we plug in external things in Drill).
>>>> 
>>>> Trying to get every project to agree on extensions seems like it may be
>>>> hard.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> I can see why Jacques wants this syntax.
>>>>> 
>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>> predictable. Any variation should happen at validation time, or later.
>>>>> 
>>>>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>>>>> tough design discussion.
>>>>> 
>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>> they
>>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
>> (by
>>>>> which I mean Jacques and James, I guess) need to agree what the SQL
>>> syntax
>>>>> should be.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>>>>>> 
>>>>>> Looking at those two examples I agree with Jacques. The first
>> appears
>>>>> more
>>>>>> like a hint from the syntactic sugar point of view.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
>>> could
>>>>>>> have a switch. Given that SQL Server and Postgres support it seems
>>>>>>> reasonable to support the table functions without the TABLE syntax.
>>>>>>> 
>>>>>>> I for one definitely think the TABLE syntax is much more confusing
>> to
>>>>> use,
>>>>>>> especially in the example that we're looking to support, such as:
>>>>>>> 
>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>>> =>
>>>>>>> '|', skipFirstRow => true)
>>>>>>> 
>>>>>>> This seems much clearer than:
>>>>>>> 
>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>> fieldDelimiter
>>>>>>> => '|', skipFirstRow => true))
>>>>>>> 
>>>>>>> It also looks much more like a hint to the table (which is our
>> goal).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jacques Nadeau
>>>>>>> CTO and Co-Founder, Dremio
>>>>>>> 
>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>> do.
>>>>> It is
>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>> standard
>>>>> an
>>>>>>>> make the TABLE keyword optional.
>>>>>>>> 
>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>> clash
>>>>> with
>>>>>>>> one of our own (few) extensions. In
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>> EXTENDS
>>>>>>>> clause. You can write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
>>> table
>>>>>>> but
>>>>>>>> you would like to use them in this particular query. We chose to
>>> make
>>>>> the
>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> That is uncomfortably close to
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>> 
>>>>>>>> so we would require
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>> 
>>>>>>>> if EmpFunction was a table-function. You could combine the two
>> forms
>>>>> like
>>>>>>>> this:
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>> 
>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>> should
>>>>>>> also
>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> - Table function syntax: I did a quick search and it seems
>> there's
>>> no
>>>>>>>>> consensus about this.
>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
>>> table
>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
>>> [4]
>>>>>>>>> expect it.
>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>> generation
>>> to
>>>>>>>> have
>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
>>> like
>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>> 
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>> [2]
>>>>>>> 
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>> [3]
>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>> [4]
>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>> [5]
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>> 
>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>> But that seems too easy to be true. Possibly this method is
>> called
>>>>> more
>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>> 
>>>>>>>>> FYI this would happen only when using named parameter. We do want
>>> to
>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>> 
>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>> 
>>>>>>>>> Julien
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>> Calcite
>>>>> when
>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes; see below.
>>>>>>>>>> 
>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
>>> just
>>>>>>>> being
>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
>>> the
>>>>>>> way
>>>>>>>> to
>>>>>>>>>> go.
>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>>>>>>> necessary?*
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Consider:
>>>>>>>>>> 
>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>> 
>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
>>> Also,
>>>>>>> if
>>>>>>>> f
>>>>>>>>>> is a function with zero arguments, could you invoke it like
>> this?:
>>>>>>>>>> 
>>>>>>>>>> select * from f
>>>>>>>>>> 
>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>> standards
>>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>>>>>>> 
>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
>>> in
>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>> get
>>> a
>>>>>>> NPE
>>>>>>>>>> later on.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
>>> JIRA
>>>>>>> case
>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>> arguments.
>>>>>>>>>> (It probably applies to all functions, not just table
>> functions.)
>>>>> The
>>>>>>>> fix
>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>> 
>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Julien
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Jim Scott*
>>>>>> Director, Enterprise Strategy & Architecture
>>>>>> +1 (347) 746-9281
>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>> 
>>>>>> <http://www.mapr.com/>
>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>> 
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>> 
>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Can you use both together? Say

  select columns
  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND (foo INTEGER)

Julian



> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> I took a stab at adding the TableFunction syntax without table(...) in
> Calcite.
> I have verified that both the table function and extend (with or without
> keyword) work
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> 
> These work:
> 
> select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
> '|')
> 
> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter => '|'))
> 
> select columns from table(dfs.`/path/to/myfile`('JSON'))
> 
> select columns from dfs.`/path/to/myfile`('JSON')
> 
> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> 
> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
>> Drill does implicitly what Phoenix does explicitly so I don't think we
>> should constrain ourselves to having a union of the two syntaxes.
>> 
>> 
>> That being said, I think we could make these work together... maybe.
>> 
>> Remove the EXTENDS without keyword syntax from the grammar.
>> 
>> Create a new sub block in the table block that requires no keyword. There
>> would be two paths (and would probably require some lookahead)
>> 
>> option 1> unnamed parameters (1,2,3)
>> option 2> named parameters (a => 1, b=>2, c=> 3)
>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> golfHandicap INTEGER)
>> 
>> Then we create a table function with options 1 & 2, an EXTENDS clause for
>> option 3.
>> 
>> Best of both worlds?
>> 
>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
>> wrote:
>> 
>>> Phoenix already supports columns at read-time using the syntax without
>> the
>>> EXTENDS keyword as Julian indicated:
>>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>   WHERE goldHandicap < 10;
>>> 
>>> Changing this by requiring the EXTENDS keyword would create a backward
>>> compatibility problem.
>>> 
>>> I think it'd be good if both of these extensions worked in Drill &
>> Phoenix
>>> given our Drillix initiative.
>>> 
>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>> 
>>>> My proposal was an a or b using the freemarker template in the grammar,
>>>> not something later.
>>>> 
>>>> Actually, put another way: we may want to consider stating that we only
>>>> incorporate SQL standards in our primary grammar. Any extensions should
>>> be
>>>> optional grammar. We could simply have grammar plugins in Calcite (the
>>> same
>>>> way we plug in external things in Drill).
>>>> 
>>>> Trying to get every project to agree on extensions seems like it may be
>>>> hard.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> I can see why Jacques wants this syntax.
>>>>> 
>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>> predictable. Any variation should happen at validation time, or later.
>>>>> 
>>>>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>>>>> tough design discussion.
>>>>> 
>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>> they
>>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
>> (by
>>>>> which I mean Jacques and James, I guess) need to agree what the SQL
>>> syntax
>>>>> should be.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>>>>>> 
>>>>>> Looking at those two examples I agree with Jacques. The first
>> appears
>>>>> more
>>>>>> like a hint from the syntactic sugar point of view.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
>>> could
>>>>>>> have a switch. Given that SQL Server and Postgres support it seems
>>>>>>> reasonable to support the table functions without the TABLE syntax.
>>>>>>> 
>>>>>>> I for one definitely think the TABLE syntax is much more confusing
>> to
>>>>> use,
>>>>>>> especially in the example that we're looking to support, such as:
>>>>>>> 
>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>>> =>
>>>>>>> '|', skipFirstRow => true)
>>>>>>> 
>>>>>>> This seems much clearer than:
>>>>>>> 
>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>> fieldDelimiter
>>>>>>> => '|', skipFirstRow => true))
>>>>>>> 
>>>>>>> It also looks much more like a hint to the table (which is our
>> goal).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jacques Nadeau
>>>>>>> CTO and Co-Founder, Dremio
>>>>>>> 
>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>> do.
>>>>> It is
>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>> standard
>>>>> an
>>>>>>>> make the TABLE keyword optional.
>>>>>>>> 
>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>> clash
>>>>> with
>>>>>>>> one of our own (few) extensions. In
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>> EXTENDS
>>>>>>>> clause. You can write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
>>> table
>>>>>>> but
>>>>>>>> you would like to use them in this particular query. We chose to
>>> make
>>>>> the
>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> That is uncomfortably close to
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>> 
>>>>>>>> so we would require
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>> 
>>>>>>>> if EmpFunction was a table-function. You could combine the two
>> forms
>>>>> like
>>>>>>>> this:
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>> 
>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>> should
>>>>>>> also
>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> - Table function syntax: I did a quick search and it seems
>> there's
>>> no
>>>>>>>>> consensus about this.
>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
>>> table
>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
>>> [4]
>>>>>>>>> expect it.
>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>> generation
>>> to
>>>>>>>> have
>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
>>> like
>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>> 
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>> [2]
>>>>>>> 
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>> [3]
>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>> [4]
>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>> [5]
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>> 
>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>> But that seems too easy to be true. Possibly this method is
>> called
>>>>> more
>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>> 
>>>>>>>>> FYI this would happen only when using named parameter. We do want
>>> to
>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>> 
>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>> 
>>>>>>>>> Julien
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>> Calcite
>>>>> when
>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes; see below.
>>>>>>>>>> 
>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
>>> just
>>>>>>>> being
>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
>>> the
>>>>>>> way
>>>>>>>> to
>>>>>>>>>> go.
>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>>>>>>> necessary?*
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Consider:
>>>>>>>>>> 
>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>> 
>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
>>> Also,
>>>>>>> if
>>>>>>>> f
>>>>>>>>>> is a function with zero arguments, could you invoke it like
>> this?:
>>>>>>>>>> 
>>>>>>>>>> select * from f
>>>>>>>>>> 
>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>> standards
>>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>>>>>>> 
>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
>>> in
>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>> get
>>> a
>>>>>>> NPE
>>>>>>>>>> later on.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
>>> JIRA
>>>>>>> case
>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>> arguments.
>>>>>>>>>> (It probably applies to all functions, not just table
>> functions.)
>>>>> The
>>>>>>>> fix
>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>> 
>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Julien
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Jim Scott*
>>>>>> Director, Enterprise Strategy & Architecture
>>>>>> +1 (347) 746-9281
>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>> 
>>>>>> <http://www.mapr.com/>
>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>> 
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>> 
>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

I took a stab at adding the TableFunction syntax without table(...) in
Calcite.
I have verified that both the table function and extend (with or without
keyword) work
https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34

These work:

select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
'|')

select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
fieldDelimiter => '|'))

select columns from table(dfs.`/path/to/myfile`('JSON'))

select columns from dfs.`/path/to/myfile`('JSON')

select columns from dfs.`/path/to/myfile`(type => 'JSON')

On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:

> Drill does implicitly what Phoenix does explicitly so I don't think we
> should constrain ourselves to having a union of the two syntaxes.
>
>
> That being said, I think we could make these work together... maybe.
>
> Remove the EXTENDS without keyword syntax from the grammar.
>
> Create a new sub block in the table block that requires no keyword. There
> would be two paths (and would probably require some lookahead)
>
> option 1> unnamed parameters (1,2,3)
> option 2> named parameters (a => 1, b=>2, c=> 3)
> option 3> create table field pattern (favoriteBand VARCHAR(100),
> golfHandicap INTEGER)
>
> Then we create a table function with options 1 & 2, an EXTENDS clause for
> option 3.
>
> Best of both worlds?
>
> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> wrote:
>
> > Phoenix already supports columns at read-time using the syntax without
> the
> > EXTENDS keyword as Julian indicated:
> >    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >    WHERE goldHandicap < 10;
> >
> > Changing this by requiring the EXTENDS keyword would create a backward
> > compatibility problem.
> >
> > I think it'd be good if both of these extensions worked in Drill &
> Phoenix
> > given our Drillix initiative.
> >
> > On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >
> > > My proposal was an a or b using the freemarker template in the grammar,
> > > not something later.
> > >
> > > Actually, put another way: we may want to consider stating that we only
> > > incorporate SQL standards in our primary grammar. Any extensions should
> > be
> > > optional grammar. We could simply have grammar plugins in Calcite (the
> > same
> > > way we plug in external things in Drill).
> > >
> > > Trying to get every project to agree on extensions seems like it may be
> > > hard.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >> I can see why Jacques wants this syntax.
> > >>
> > >> However a “switch" in a grammar is a bad idea. Grammars need to be
> > >> predictable. Any variation should happen at validation time, or later.
> > >>
> > >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> > >> tough design discussion.
> > >>
> > >> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> they
> > >> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> (by
> > >> which I mean Jacques and James, I guess) need to agree what the SQL
> > syntax
> > >> should be.
> > >>
> > >> Julian
> > >>
> > >>
> > >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> > >> >
> > >> > Looking at those two examples I agree with Jacques. The first
> appears
> > >> more
> > >> > like a hint from the syntactic sugar point of view.
> > >> >
> > >> >
> > >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >
> > >> wrote:
> > >> >
> > >> >> Since EXTEND is custom functionality, it seems reasonable that we
> > could
> > >> >> have a switch. Given that SQL Server and Postgres support it seems
> > >> >> reasonable to support the table functions without the TABLE syntax.
> > >> >>
> > >> >> I for one definitely think the TABLE syntax is much more confusing
> to
> > >> use,
> > >> >> especially in the example that we're looking to support, such as:
> > >> >>
> > >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> fieldDelimiter
> > =>
> > >> >> '|', skipFirstRow => true)
> > >> >>
> > >> >> This seems much clearer than:
> > >> >>
> > >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> > >> fieldDelimiter
> > >> >> => '|', skipFirstRow => true))
> > >> >>
> > >> >> It also looks much more like a hint to the table (which is our
> goal).
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Jacques Nadeau
> > >> >> CTO and Co-Founder, Dremio
> > >> >>
> > >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> > wrote:
> > >> >>
> > >> >>> Thanks for doing the legwork and finding what the other vendors
> do.
> > >> It is
> > >> >>> indeed compelling that SQL Server and Postgres go beyond the
> > standard
> > >> an
> > >> >>> make the TABLE keyword optional.
> > >> >>>
> > >> >>> I tried that syntax in Calcite and discovered that there is a
> clash
> > >> with
> > >> >>> one of our own (few) extensions. In
> > >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> > >> EXTENDS
> > >> >>> clause. You can write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> to tell Calcite that there are two undeclared columns in the Emp
> > table
> > >> >> but
> > >> >>> you would like to use them in this particular query. We chose to
> > make
> > >> the
> > >> >>> EXTEND keyword optional, so you could instead write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> That is uncomfortably close to
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> > >> >>>
> > >> >>> so we would require
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> > >> >>>
> > >> >>> if EmpFunction was a table-function. You could combine the two
> forms
> > >> like
> > >> >>> this:
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > >> >>> (anotherAttribute INTEGER);
> > >> >>>
> > >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> > should
> > >> >> also
> > >> >>> ask whether requiring folks to type TABLE is such a hardship.
> > >> >>>
> > >> >>> Julian
> > >> >>>
> > >> >>>
> > >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> > wrote:
> > >> >>>>
> > >> >>>> - Table function syntax: I did a quick search and it seems
> there's
> > no
> > >> >>>> consensus about this.
> > >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> > table
> > >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> > [4]
> > >> >>>> expect it.
> > >> >>>> MySQL does not have table functions [5]
> > >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> > >> >>>> Would it be reasonable to allow a switch in the grammar
> generation
> > to
> > >> >>> have
> > >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> > like
> > >> >>>> syntax (back ticks for identifiers etc)
> > >> >>>>
> > >> >>>> [1]
> > >> >>
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > >> >>>> [2]
> > >> >>
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > >> >>>> [3]
> > https://oracle-base.com/articles/misc/pipelined-table-functions
> > >> >>>> [4]
> > http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > >> >>>> [5]
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >> >>>>
> > >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> > >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> > >> >>>> But that seems too easy to be true. Possibly this method is
> called
> > >> more
> > >> >>>> than once (before and after the function has been resolved?)
> > >> >>>>
> > >> >>>> FYI this would happen only when using named parameter. We do want
> > to
> > >> >>>> overload in this case, which is why I'm looking into it.
> > >> >>>>
> > >> >>>> I'll fill a JIRA for my other branch
> > >> >>>>
> > >> >>>> Julien
> > >> >>>>
> > >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> > >> wrote:
> > >> >>>>
> > >> >>>>>
> > >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> > >> wrote:
> > >> >>>>>
> > >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in
> Calcite
> > >> when
> > >> >>>>> there's more than 1 function with the same name.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes; see below.
> > >> >>>>>
> > >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> > just
> > >> >>> being
> > >> >>>>> able to specify the delimiter for csv files.
> > >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> > the
> > >> >> way
> > >> >>> to
> > >> >>>>> go.
> > >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> >>>>> necessary?*
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Consider:
> > >> >>>>>
> > >> >>>>> select * from myTable as f(x, y)
> > >> >>>>> select * from myTable f(x, y)
> > >> >>>>> select * from myFunction(x, y)
> > >> >>>>>
> > >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> > Also,
> > >> >> if
> > >> >>> f
> > >> >>>>> is a function with zero arguments, could you invoke it like
> this?:
> > >> >>>>>
> > >> >>>>> select * from f
> > >> >>>>>
> > >> >>>>> I don’t know the actual rationale. But I know that the SQL
> > standards
> > >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> > >> >>>>>
> > >> >>>>> I had to fix some things in Calcite to enable this:
> > >> >>>>> https://github.com/dremio/calcite/pull/1/files
> > >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> > in
> > >> >>>>> Calcite for the Maze example.
> > >> >>>>> Which is why some hooks were missing.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Can you log a jira case to track this bug?
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> > >> >>>>> Here is a test that reproduces the problem:
> > >> >>>>> https://github.com/apache/calcite/pull/166
> > >> >>>>> If we return more than 1 TableFunction with the same name, we
> get
> > a
> > >> >> NPE
> > >> >>>>> later on.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> > JIRA
> > >> >> case
> > >> >>>>> on resolution of overloaded functions when invoked with named
> > >> >> arguments.
> > >> >>>>> (It probably applies to all functions, not just table
> functions.)
> > >> The
> > >> >>> fix
> > >> >>>>> will take a while (if you wait for me to write it).
> > >> >>>>>
> > >> >>>>> For now please tell your users not to overload. :)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Julian
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> --
> > >> >>>> Julien
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > *Jim Scott*
> > >> > Director, Enterprise Strategy & Architecture
> > >> > +1 (347) 746-9281
> > >> > @kingmesal <https://twitter.com/kingmesal>
> > >> >
> > >> > <http://www.mapr.com/>
> > >> > [image: MapR Technologies] <http://www.mapr.com>
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >
> > >>
> > >>
> > >
> >
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

I took a stab at adding the TableFunction syntax without table(...) in
Calcite.
I have verified that both the table function and extend (with or without
keyword) work
https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34

These work:

select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
'|')

select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
fieldDelimiter => '|'))

select columns from table(dfs.`/path/to/myfile`('JSON'))

select columns from dfs.`/path/to/myfile`('JSON')

select columns from dfs.`/path/to/myfile`(type => 'JSON')

On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:

> Drill does implicitly what Phoenix does explicitly so I don't think we
> should constrain ourselves to having a union of the two syntaxes.
>
>
> That being said, I think we could make these work together... maybe.
>
> Remove the EXTENDS without keyword syntax from the grammar.
>
> Create a new sub block in the table block that requires no keyword. There
> would be two paths (and would probably require some lookahead)
>
> option 1> unnamed parameters (1,2,3)
> option 2> named parameters (a => 1, b=>2, c=> 3)
> option 3> create table field pattern (favoriteBand VARCHAR(100),
> golfHandicap INTEGER)
>
> Then we create a table function with options 1 & 2, an EXTENDS clause for
> option 3.
>
> Best of both worlds?
>
> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> wrote:
>
> > Phoenix already supports columns at read-time using the syntax without
> the
> > EXTENDS keyword as Julian indicated:
> >    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >    WHERE goldHandicap < 10;
> >
> > Changing this by requiring the EXTENDS keyword would create a backward
> > compatibility problem.
> >
> > I think it'd be good if both of these extensions worked in Drill &
> Phoenix
> > given our Drillix initiative.
> >
> > On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >
> > > My proposal was an a or b using the freemarker template in the grammar,
> > > not something later.
> > >
> > > Actually, put another way: we may want to consider stating that we only
> > > incorporate SQL standards in our primary grammar. Any extensions should
> > be
> > > optional grammar. We could simply have grammar plugins in Calcite (the
> > same
> > > way we plug in external things in Drill).
> > >
> > > Trying to get every project to agree on extensions seems like it may be
> > > hard.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >> I can see why Jacques wants this syntax.
> > >>
> > >> However a “switch" in a grammar is a bad idea. Grammars need to be
> > >> predictable. Any variation should happen at validation time, or later.
> > >>
> > >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> > >> tough design discussion.
> > >>
> > >> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> they
> > >> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> (by
> > >> which I mean Jacques and James, I guess) need to agree what the SQL
> > syntax
> > >> should be.
> > >>
> > >> Julian
> > >>
> > >>
> > >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> > >> >
> > >> > Looking at those two examples I agree with Jacques. The first
> appears
> > >> more
> > >> > like a hint from the syntactic sugar point of view.
> > >> >
> > >> >
> > >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >
> > >> wrote:
> > >> >
> > >> >> Since EXTEND is custom functionality, it seems reasonable that we
> > could
> > >> >> have a switch. Given that SQL Server and Postgres support it seems
> > >> >> reasonable to support the table functions without the TABLE syntax.
> > >> >>
> > >> >> I for one definitely think the TABLE syntax is much more confusing
> to
> > >> use,
> > >> >> especially in the example that we're looking to support, such as:
> > >> >>
> > >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> fieldDelimiter
> > =>
> > >> >> '|', skipFirstRow => true)
> > >> >>
> > >> >> This seems much clearer than:
> > >> >>
> > >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> > >> fieldDelimiter
> > >> >> => '|', skipFirstRow => true))
> > >> >>
> > >> >> It also looks much more like a hint to the table (which is our
> goal).
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Jacques Nadeau
> > >> >> CTO and Co-Founder, Dremio
> > >> >>
> > >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> > wrote:
> > >> >>
> > >> >>> Thanks for doing the legwork and finding what the other vendors
> do.
> > >> It is
> > >> >>> indeed compelling that SQL Server and Postgres go beyond the
> > standard
> > >> an
> > >> >>> make the TABLE keyword optional.
> > >> >>>
> > >> >>> I tried that syntax in Calcite and discovered that there is a
> clash
> > >> with
> > >> >>> one of our own (few) extensions. In
> > >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> > >> EXTENDS
> > >> >>> clause. You can write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> to tell Calcite that there are two undeclared columns in the Emp
> > table
> > >> >> but
> > >> >>> you would like to use them in this particular query. We chose to
> > make
> > >> the
> > >> >>> EXTEND keyword optional, so you could instead write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> That is uncomfortably close to
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> > >> >>>
> > >> >>> so we would require
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> > >> >>>
> > >> >>> if EmpFunction was a table-function. You could combine the two
> forms
> > >> like
> > >> >>> this:
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > >> >>> (anotherAttribute INTEGER);
> > >> >>>
> > >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> > should
> > >> >> also
> > >> >>> ask whether requiring folks to type TABLE is such a hardship.
> > >> >>>
> > >> >>> Julian
> > >> >>>
> > >> >>>
> > >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> > wrote:
> > >> >>>>
> > >> >>>> - Table function syntax: I did a quick search and it seems
> there's
> > no
> > >> >>>> consensus about this.
> > >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> > table
> > >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> > [4]
> > >> >>>> expect it.
> > >> >>>> MySQL does not have table functions [5]
> > >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> > >> >>>> Would it be reasonable to allow a switch in the grammar
> generation
> > to
> > >> >>> have
> > >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> > like
> > >> >>>> syntax (back ticks for identifiers etc)
> > >> >>>>
> > >> >>>> [1]
> > >> >>
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > >> >>>> [2]
> > >> >>
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > >> >>>> [3]
> > https://oracle-base.com/articles/misc/pipelined-table-functions
> > >> >>>> [4]
> > http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > >> >>>> [5]
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >> >>>>
> > >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> > >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> > >> >>>> But that seems too easy to be true. Possibly this method is
> called
> > >> more
> > >> >>>> than once (before and after the function has been resolved?)
> > >> >>>>
> > >> >>>> FYI this would happen only when using named parameter. We do want
> > to
> > >> >>>> overload in this case, which is why I'm looking into it.
> > >> >>>>
> > >> >>>> I'll fill a JIRA for my other branch
> > >> >>>>
> > >> >>>> Julien
> > >> >>>>
> > >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> > >> wrote:
> > >> >>>>
> > >> >>>>>
> > >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> > >> wrote:
> > >> >>>>>
> > >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in
> Calcite
> > >> when
> > >> >>>>> there's more than 1 function with the same name.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes; see below.
> > >> >>>>>
> > >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> > just
> > >> >>> being
> > >> >>>>> able to specify the delimiter for csv files.
> > >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> > the
> > >> >> way
> > >> >>> to
> > >> >>>>> go.
> > >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> >>>>> necessary?*
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Consider:
> > >> >>>>>
> > >> >>>>> select * from myTable as f(x, y)
> > >> >>>>> select * from myTable f(x, y)
> > >> >>>>> select * from myFunction(x, y)
> > >> >>>>>
> > >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> > Also,
> > >> >> if
> > >> >>> f
> > >> >>>>> is a function with zero arguments, could you invoke it like
> this?:
> > >> >>>>>
> > >> >>>>> select * from f
> > >> >>>>>
> > >> >>>>> I don’t know the actual rationale. But I know that the SQL
> > standards
> > >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> > >> >>>>>
> > >> >>>>> I had to fix some things in Calcite to enable this:
> > >> >>>>> https://github.com/dremio/calcite/pull/1/files
> > >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> > in
> > >> >>>>> Calcite for the Maze example.
> > >> >>>>> Which is why some hooks were missing.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Can you log a jira case to track this bug?
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> > >> >>>>> Here is a test that reproduces the problem:
> > >> >>>>> https://github.com/apache/calcite/pull/166
> > >> >>>>> If we return more than 1 TableFunction with the same name, we
> get
> > a
> > >> >> NPE
> > >> >>>>> later on.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> > JIRA
> > >> >> case
> > >> >>>>> on resolution of overloaded functions when invoked with named
> > >> >> arguments.
> > >> >>>>> (It probably applies to all functions, not just table
> functions.)
> > >> The
> > >> >>> fix
> > >> >>>>> will take a while (if you wait for me to write it).
> > >> >>>>>
> > >> >>>>> For now please tell your users not to overload. :)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Julian
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> --
> > >> >>>> Julien
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > *Jim Scott*
> > >> > Director, Enterprise Strategy & Architecture
> > >> > +1 (347) 746-9281
> > >> > @kingmesal <https://twitter.com/kingmesal>
> > >> >
> > >> > <http://www.mapr.com/>
> > >> > [image: MapR Technologies] <http://www.mapr.com>
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >
> > >>
> > >>
> > >
> >
>



-- 
Julien

Re: select from table with options

Posted by Jacques Nadeau <ja...@apache.org>.

Drill does implicitly what Phoenix does explicitly so I don't think we
should constrain ourselves to having a union of the two syntaxes.


That being said, I think we could make these work together... maybe.

Remove the EXTENDS without keyword syntax from the grammar.

Create a new sub block in the table block that requires no keyword. There
would be two paths (and would probably require some lookahead)

option 1> unnamed parameters (1,2,3)
option 2> named parameters (a => 1, b=>2, c=> 3)
option 3> create table field pattern (favoriteBand VARCHAR(100),
golfHandicap INTEGER)

Then we create a table function with options 1 & 2, an EXTENDS clause for
option 3.

Best of both worlds?

On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org> wrote:

> Phoenix already supports columns at read-time using the syntax without the
> EXTENDS keyword as Julian indicated:
>    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>    WHERE goldHandicap < 10;
>
> Changing this by requiring the EXTENDS keyword would create a backward
> compatibility problem.
>
> I think it'd be good if both of these extensions worked in Drill & Phoenix
> given our Drillix initiative.
>
> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:
>
> > My proposal was an a or b using the freemarker template in the grammar,
> > not something later.
> >
> > Actually, put another way: we may want to consider stating that we only
> > incorporate SQL standards in our primary grammar. Any extensions should
> be
> > optional grammar. We could simply have grammar plugins in Calcite (the
> same
> > way we plug in external things in Drill).
> >
> > Trying to get every project to agree on extensions seems like it may be
> > hard.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> I can see why Jacques wants this syntax.
> >>
> >> However a “switch" in a grammar is a bad idea. Grammars need to be
> >> predictable. Any variation should happen at validation time, or later.
> >>
> >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> >> tough design discussion.
> >>
> >> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
> >> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
> >> which I mean Jacques and James, I guess) need to agree what the SQL
> syntax
> >> should be.
> >>
> >> Julian
> >>
> >>
> >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >> >
> >> > Looking at those two examples I agree with Jacques. The first appears
> >> more
> >> > like a hint from the syntactic sugar point of view.
> >> >
> >> >
> >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >> >
> >> >> Since EXTEND is custom functionality, it seems reasonable that we
> could
> >> >> have a switch. Given that SQL Server and Postgres support it seems
> >> >> reasonable to support the table functions without the TABLE syntax.
> >> >>
> >> >> I for one definitely think the TABLE syntax is much more confusing to
> >> use,
> >> >> especially in the example that we're looking to support, such as:
> >> >>
> >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> =>
> >> >> '|', skipFirstRow => true)
> >> >>
> >> >> This seems much clearer than:
> >> >>
> >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >> >> => '|', skipFirstRow => true))
> >> >>
> >> >> It also looks much more like a hint to the table (which is our goal).
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jacques Nadeau
> >> >> CTO and Co-Founder, Dremio
> >> >>
> >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >> >>
> >> >>> Thanks for doing the legwork and finding what the other vendors do.
> >> It is
> >> >>> indeed compelling that SQL Server and Postgres go beyond the
> standard
> >> an
> >> >>> make the TABLE keyword optional.
> >> >>>
> >> >>> I tried that syntax in Calcite and discovered that there is a clash
> >> with
> >> >>> one of our own (few) extensions. In
> >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >> EXTENDS
> >> >>> clause. You can write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> to tell Calcite that there are two undeclared columns in the Emp
> table
> >> >> but
> >> >>> you would like to use them in this particular query. We chose to
> make
> >> the
> >> >>> EXTEND keyword optional, so you could instead write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> That is uncomfortably close to
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> >> >>>
> >> >>> so we would require
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >> >>>
> >> >>> if EmpFunction was a table-function. You could combine the two forms
> >> like
> >> >>> this:
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >> >>> (anotherAttribute INTEGER);
> >> >>>
> >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> should
> >> >> also
> >> >>> ask whether requiring folks to type TABLE is such a hardship.
> >> >>>
> >> >>> Julian
> >> >>>
> >> >>>
> >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >> >>>>
> >> >>>> - Table function syntax: I did a quick search and it seems there's
> no
> >> >>>> consensus about this.
> >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> table
> >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> [4]
> >> >>>> expect it.
> >> >>>> MySQL does not have table functions [5]
> >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >> >>>> Would it be reasonable to allow a switch in the grammar generation
> to
> >> >>> have
> >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> like
> >> >>>> syntax (back ticks for identifiers etc)
> >> >>>>
> >> >>>> [1]
> >> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >> >>>> [2]
> >> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >> >>>> [3]
> https://oracle-base.com/articles/misc/pipelined-table-functions
> >> >>>> [4]
> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >> >>>> [5]
> >> >>>>
> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >> >>>>
> >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> >> >>>> But that seems too easy to be true. Possibly this method is called
> >> more
> >> >>>> than once (before and after the function has been resolved?)
> >> >>>>
> >> >>>> FYI this would happen only when using named parameter. We do want
> to
> >> >>>> overload in this case, which is why I'm looking into it.
> >> >>>>
> >> >>>> I'll fill a JIRA for my other branch
> >> >>>>
> >> >>>> Julien
> >> >>>>
> >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >> >>>>
> >> >>>>>
> >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >> >>>>>
> >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
> >> when
> >> >>>>> there's more than 1 function with the same name.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes; see below.
> >> >>>>>
> >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> just
> >> >>> being
> >> >>>>> able to specify the delimiter for csv files.
> >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> the
> >> >> way
> >> >>> to
> >> >>>>> go.
> >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >> >>>>> necessary?*
> >> >>>>>
> >> >>>>>
> >> >>>>> Consider:
> >> >>>>>
> >> >>>>> select * from myTable as f(x, y)
> >> >>>>> select * from myTable f(x, y)
> >> >>>>> select * from myFunction(x, y)
> >> >>>>>
> >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> Also,
> >> >> if
> >> >>> f
> >> >>>>> is a function with zero arguments, could you invoke it like this?:
> >> >>>>>
> >> >>>>> select * from f
> >> >>>>>
> >> >>>>> I don’t know the actual rationale. But I know that the SQL
> standards
> >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> >> >>>>>
> >> >>>>> I had to fix some things in Calcite to enable this:
> >> >>>>> https://github.com/dremio/calcite/pull/1/files
> >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> in
> >> >>>>> Calcite for the Maze example.
> >> >>>>> Which is why some hooks were missing.
> >> >>>>>
> >> >>>>>
> >> >>>>> Can you log a jira case to track this bug?
> >> >>>>>
> >> >>>>>
> >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >> >>>>> Here is a test that reproduces the problem:
> >> >>>>> https://github.com/apache/calcite/pull/166
> >> >>>>> If we return more than 1 TableFunction with the same name, we get
> a
> >> >> NPE
> >> >>>>> later on.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> JIRA
> >> >> case
> >> >>>>> on resolution of overloaded functions when invoked with named
> >> >> arguments.
> >> >>>>> (It probably applies to all functions, not just table functions.)
> >> The
> >> >>> fix
> >> >>>>> will take a while (if you wait for me to write it).
> >> >>>>>
> >> >>>>> For now please tell your users not to overload. :)
> >> >>>>>
> >> >>>>>
> >> >>>>> Julian
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Julien
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > *Jim Scott*
> >> > Director, Enterprise Strategy & Architecture
> >> > +1 (347) 746-9281
> >> > @kingmesal <https://twitter.com/kingmesal>
> >> >
> >> > <http://www.mapr.com/>
> >> > [image: MapR Technologies] <http://www.mapr.com>
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >
> >>
> >>
> >
>

Re: select from table with options

Posted by Jacques Nadeau <ja...@apache.org>.

Drill does implicitly what Phoenix does explicitly so I don't think we
should constrain ourselves to having a union of the two syntaxes.


That being said, I think we could make these work together... maybe.

Remove the EXTENDS without keyword syntax from the grammar.

Create a new sub block in the table block that requires no keyword. There
would be two paths (and would probably require some lookahead)

option 1> unnamed parameters (1,2,3)
option 2> named parameters (a => 1, b=>2, c=> 3)
option 3> create table field pattern (favoriteBand VARCHAR(100),
golfHandicap INTEGER)

Then we create a table function with options 1 & 2, an EXTENDS clause for
option 3.

Best of both worlds?

On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org> wrote:

> Phoenix already supports columns at read-time using the syntax without the
> EXTENDS keyword as Julian indicated:
>    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>    WHERE goldHandicap < 10;
>
> Changing this by requiring the EXTENDS keyword would create a backward
> compatibility problem.
>
> I think it'd be good if both of these extensions worked in Drill & Phoenix
> given our Drillix initiative.
>
> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:
>
> > My proposal was an a or b using the freemarker template in the grammar,
> > not something later.
> >
> > Actually, put another way: we may want to consider stating that we only
> > incorporate SQL standards in our primary grammar. Any extensions should
> be
> > optional grammar. We could simply have grammar plugins in Calcite (the
> same
> > way we plug in external things in Drill).
> >
> > Trying to get every project to agree on extensions seems like it may be
> > hard.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> I can see why Jacques wants this syntax.
> >>
> >> However a “switch" in a grammar is a bad idea. Grammars need to be
> >> predictable. Any variation should happen at validation time, or later.
> >>
> >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> >> tough design discussion.
> >>
> >> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
> >> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
> >> which I mean Jacques and James, I guess) need to agree what the SQL
> syntax
> >> should be.
> >>
> >> Julian
> >>
> >>
> >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >> >
> >> > Looking at those two examples I agree with Jacques. The first appears
> >> more
> >> > like a hint from the syntactic sugar point of view.
> >> >
> >> >
> >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >> >
> >> >> Since EXTEND is custom functionality, it seems reasonable that we
> could
> >> >> have a switch. Given that SQL Server and Postgres support it seems
> >> >> reasonable to support the table functions without the TABLE syntax.
> >> >>
> >> >> I for one definitely think the TABLE syntax is much more confusing to
> >> use,
> >> >> especially in the example that we're looking to support, such as:
> >> >>
> >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> =>
> >> >> '|', skipFirstRow => true)
> >> >>
> >> >> This seems much clearer than:
> >> >>
> >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >> >> => '|', skipFirstRow => true))
> >> >>
> >> >> It also looks much more like a hint to the table (which is our goal).
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jacques Nadeau
> >> >> CTO and Co-Founder, Dremio
> >> >>
> >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >> >>
> >> >>> Thanks for doing the legwork and finding what the other vendors do.
> >> It is
> >> >>> indeed compelling that SQL Server and Postgres go beyond the
> standard
> >> an
> >> >>> make the TABLE keyword optional.
> >> >>>
> >> >>> I tried that syntax in Calcite and discovered that there is a clash
> >> with
> >> >>> one of our own (few) extensions. In
> >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >> EXTENDS
> >> >>> clause. You can write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> to tell Calcite that there are two undeclared columns in the Emp
> table
> >> >> but
> >> >>> you would like to use them in this particular query. We chose to
> make
> >> the
> >> >>> EXTEND keyword optional, so you could instead write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> That is uncomfortably close to
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> >> >>>
> >> >>> so we would require
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >> >>>
> >> >>> if EmpFunction was a table-function. You could combine the two forms
> >> like
> >> >>> this:
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >> >>> (anotherAttribute INTEGER);
> >> >>>
> >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> should
> >> >> also
> >> >>> ask whether requiring folks to type TABLE is such a hardship.
> >> >>>
> >> >>> Julian
> >> >>>
> >> >>>
> >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >> >>>>
> >> >>>> - Table function syntax: I did a quick search and it seems there's
> no
> >> >>>> consensus about this.
> >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> table
> >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> [4]
> >> >>>> expect it.
> >> >>>> MySQL does not have table functions [5]
> >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >> >>>> Would it be reasonable to allow a switch in the grammar generation
> to
> >> >>> have
> >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> like
> >> >>>> syntax (back ticks for identifiers etc)
> >> >>>>
> >> >>>> [1]
> >> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >> >>>> [2]
> >> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >> >>>> [3]
> https://oracle-base.com/articles/misc/pipelined-table-functions
> >> >>>> [4]
> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >> >>>> [5]
> >> >>>>
> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >> >>>>
> >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> >> >>>> But that seems too easy to be true. Possibly this method is called
> >> more
> >> >>>> than once (before and after the function has been resolved?)
> >> >>>>
> >> >>>> FYI this would happen only when using named parameter. We do want
> to
> >> >>>> overload in this case, which is why I'm looking into it.
> >> >>>>
> >> >>>> I'll fill a JIRA for my other branch
> >> >>>>
> >> >>>> Julien
> >> >>>>
> >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >> >>>>
> >> >>>>>
> >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >> >>>>>
> >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
> >> when
> >> >>>>> there's more than 1 function with the same name.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes; see below.
> >> >>>>>
> >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> just
> >> >>> being
> >> >>>>> able to specify the delimiter for csv files.
> >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> the
> >> >> way
> >> >>> to
> >> >>>>> go.
> >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >> >>>>> necessary?*
> >> >>>>>
> >> >>>>>
> >> >>>>> Consider:
> >> >>>>>
> >> >>>>> select * from myTable as f(x, y)
> >> >>>>> select * from myTable f(x, y)
> >> >>>>> select * from myFunction(x, y)
> >> >>>>>
> >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> Also,
> >> >> if
> >> >>> f
> >> >>>>> is a function with zero arguments, could you invoke it like this?:
> >> >>>>>
> >> >>>>> select * from f
> >> >>>>>
> >> >>>>> I don’t know the actual rationale. But I know that the SQL
> standards
> >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> >> >>>>>
> >> >>>>> I had to fix some things in Calcite to enable this:
> >> >>>>> https://github.com/dremio/calcite/pull/1/files
> >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> in
> >> >>>>> Calcite for the Maze example.
> >> >>>>> Which is why some hooks were missing.
> >> >>>>>
> >> >>>>>
> >> >>>>> Can you log a jira case to track this bug?
> >> >>>>>
> >> >>>>>
> >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >> >>>>> Here is a test that reproduces the problem:
> >> >>>>> https://github.com/apache/calcite/pull/166
> >> >>>>> If we return more than 1 TableFunction with the same name, we get
> a
> >> >> NPE
> >> >>>>> later on.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> JIRA
> >> >> case
> >> >>>>> on resolution of overloaded functions when invoked with named
> >> >> arguments.
> >> >>>>> (It probably applies to all functions, not just table functions.)
> >> The
> >> >>> fix
> >> >>>>> will take a while (if you wait for me to write it).
> >> >>>>>
> >> >>>>> For now please tell your users not to overload. :)
> >> >>>>>
> >> >>>>>
> >> >>>>> Julian
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Julien
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > *Jim Scott*
> >> > Director, Enterprise Strategy & Architecture
> >> > +1 (347) 746-9281
> >> > @kingmesal <https://twitter.com/kingmesal>
> >> >
> >> > <http://www.mapr.com/>
> >> > [image: MapR Technologies] <http://www.mapr.com>
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >
> >>
> >>
> >
>

Re: select from table with options

Posted by James Taylor <ja...@apache.org>.

Phoenix already supports columns at read-time using the syntax without the
EXTENDS keyword as Julian indicated:
   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
   WHERE goldHandicap < 10;

Changing this by requiring the EXTENDS keyword would create a backward
compatibility problem.

I think it'd be good if both of these extensions worked in Drill & Phoenix
given our Drillix initiative.

On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> My proposal was an a or b using the freemarker template in the grammar,
> not something later.
>
> Actually, put another way: we may want to consider stating that we only
> incorporate SQL standards in our primary grammar. Any extensions should be
> optional grammar. We could simply have grammar plugins in Calcite (the same
> way we plug in external things in Drill).
>
> Trying to get every project to agree on extensions seems like it may be
> hard.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> I can see why Jacques wants this syntax.
>>
>> However a “switch" in a grammar is a bad idea. Grammars need to be
>> predictable. Any variation should happen at validation time, or later.
>>
>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>> tough design discussion.
>>
>> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
>> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
>> which I mean Jacques and James, I guess) need to agree what the SQL syntax
>> should be.
>>
>> Julian
>>
>>
>> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>> >
>> > Looking at those two examples I agree with Jacques. The first appears
>> more
>> > like a hint from the syntactic sugar point of view.
>> >
>> >
>> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> >
>> >> Since EXTEND is custom functionality, it seems reasonable that we could
>> >> have a switch. Given that SQL Server and Postgres support it seems
>> >> reasonable to support the table functions without the TABLE syntax.
>> >>
>> >> I for one definitely think the TABLE syntax is much more confusing to
>> use,
>> >> especially in the example that we're looking to support, such as:
>> >>
>> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> >> '|', skipFirstRow => true)
>> >>
>> >> This seems much clearer than:
>> >>
>> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>> >> => '|', skipFirstRow => true))
>> >>
>> >> It also looks much more like a hint to the table (which is our goal).
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jacques Nadeau
>> >> CTO and Co-Founder, Dremio
>> >>
>> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> >>
>> >>> Thanks for doing the legwork and finding what the other vendors do.
>> It is
>> >>> indeed compelling that SQL Server and Postgres go beyond the standard
>> an
>> >>> make the TABLE keyword optional.
>> >>>
>> >>> I tried that syntax in Calcite and discovered that there is a clash
>> with
>> >>> one of our own (few) extensions. In
>> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>> EXTENDS
>> >>> clause. You can write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> to tell Calcite that there are two undeclared columns in the Emp table
>> >> but
>> >>> you would like to use them in this particular query. We chose to make
>> the
>> >>> EXTEND keyword optional, so you could instead write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> That is uncomfortably close to
>> >>>
>> >>>  SELECT *
>> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>
>> >>> so we would require
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>
>> >>> if EmpFunction was a table-function. You could combine the two forms
>> like
>> >>> this:
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>> >>> (anotherAttribute INTEGER);
>> >>>
>> >>> We could revisit whether EXTEND is optional, I suppose. But we should
>> >> also
>> >>> ask whether requiring folks to type TABLE is such a hardship.
>> >>>
>> >>> Julian
>> >>>
>> >>>
>> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> >>>>
>> >>>> - Table function syntax: I did a quick search and it seems there's no
>> >>>> consensus about this.
>> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>> >>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>> >>>> expect it.
>> >>>> MySQL does not have table functions [5]
>> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
>> >>>> Would it be reasonable to allow a switch in the grammar generation to
>> >>> have
>> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>> >>>> syntax (back ticks for identifiers etc)
>> >>>>
>> >>>> [1]
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>> [2]
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>> [5]
>> >>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>
>> >>>> - It seems a simple change in SqlCallBinding fixes the function
>> >>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>> But that seems too easy to be true. Possibly this method is called
>> more
>> >>>> than once (before and after the function has been resolved?)
>> >>>>
>> >>>> FYI this would happen only when using named parameter. We do want to
>> >>>> overload in this case, which is why I'm looking into it.
>> >>>>
>> >>>> I'll fill a JIRA for my other branch
>> >>>>
>> >>>> Julien
>> >>>>
>> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >>>>
>> >>>>>
>> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>> >>>>>
>> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
>> when
>> >>>>> there's more than 1 function with the same name.
>> >>>>>
>> >>>>>
>> >>>>> Yes; see below.
>> >>>>>
>> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>> >>> being
>> >>>>> able to specify the delimiter for csv files.
>> >>>>> So it seem the answer to my question 1) is that TableMacros are the
>> >> way
>> >>> to
>> >>>>> go.
>> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>> >>>>> necessary?*
>> >>>>>
>> >>>>>
>> >>>>> Consider:
>> >>>>>
>> >>>>> select * from myTable as f(x, y)
>> >>>>> select * from myTable f(x, y)
>> >>>>> select * from myFunction(x, y)
>> >>>>>
>> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> >> if
>> >>> f
>> >>>>> is a function with zero arguments, could you invoke it like this?:
>> >>>>>
>> >>>>> select * from f
>> >>>>>
>> >>>>> I don’t know the actual rationale. But I know that the SQL standards
>> >>>>> people in their wisdom decided to add a keyword to disambiguate.
>> >>>>>
>> >>>>> I had to fix some things in Calcite to enable this:
>> >>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> >>>>> Calcite for the Maze example.
>> >>>>> Which is why some hooks were missing.
>> >>>>>
>> >>>>>
>> >>>>> Can you log a jira case to track this bug?
>> >>>>>
>> >>>>>
>> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
>> >>>>> Here is a test that reproduces the problem:
>> >>>>> https://github.com/apache/calcite/pull/166
>> >>>>> If we return more than 1 TableFunction with the same name, we get a
>> >> NPE
>> >>>>> later on.
>> >>>>>
>> >>>>>
>> >>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> >> case
>> >>>>> on resolution of overloaded functions when invoked with named
>> >> arguments.
>> >>>>> (It probably applies to all functions, not just table functions.)
>> The
>> >>> fix
>> >>>>> will take a while (if you wait for me to write it).
>> >>>>>
>> >>>>> For now please tell your users not to overload. :)
>> >>>>>
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Julien
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > *Jim Scott*
>> > Director, Enterprise Strategy & Architecture
>> > +1 (347) 746-9281
>> > @kingmesal <https://twitter.com/kingmesal>
>> >
>> > <http://www.mapr.com/>
>> > [image: MapR Technologies] <http://www.mapr.com>
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>>
>

Re: select from table with options

Posted by James Taylor <ja...@apache.org>.

Phoenix already supports columns at read-time using the syntax without the
EXTENDS keyword as Julian indicated:
   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
   WHERE goldHandicap < 10;

Changing this by requiring the EXTENDS keyword would create a backward
compatibility problem.

I think it'd be good if both of these extensions worked in Drill & Phoenix
given our Drillix initiative.

On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> My proposal was an a or b using the freemarker template in the grammar,
> not something later.
>
> Actually, put another way: we may want to consider stating that we only
> incorporate SQL standards in our primary grammar. Any extensions should be
> optional grammar. We could simply have grammar plugins in Calcite (the same
> way we plug in external things in Drill).
>
> Trying to get every project to agree on extensions seems like it may be
> hard.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> I can see why Jacques wants this syntax.
>>
>> However a “switch" in a grammar is a bad idea. Grammars need to be
>> predictable. Any variation should happen at validation time, or later.
>>
>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>> tough design discussion.
>>
>> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
>> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
>> which I mean Jacques and James, I guess) need to agree what the SQL syntax
>> should be.
>>
>> Julian
>>
>>
>> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>> >
>> > Looking at those two examples I agree with Jacques. The first appears
>> more
>> > like a hint from the syntactic sugar point of view.
>> >
>> >
>> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> >
>> >> Since EXTEND is custom functionality, it seems reasonable that we could
>> >> have a switch. Given that SQL Server and Postgres support it seems
>> >> reasonable to support the table functions without the TABLE syntax.
>> >>
>> >> I for one definitely think the TABLE syntax is much more confusing to
>> use,
>> >> especially in the example that we're looking to support, such as:
>> >>
>> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> >> '|', skipFirstRow => true)
>> >>
>> >> This seems much clearer than:
>> >>
>> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>> >> => '|', skipFirstRow => true))
>> >>
>> >> It also looks much more like a hint to the table (which is our goal).
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jacques Nadeau
>> >> CTO and Co-Founder, Dremio
>> >>
>> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> >>
>> >>> Thanks for doing the legwork and finding what the other vendors do.
>> It is
>> >>> indeed compelling that SQL Server and Postgres go beyond the standard
>> an
>> >>> make the TABLE keyword optional.
>> >>>
>> >>> I tried that syntax in Calcite and discovered that there is a clash
>> with
>> >>> one of our own (few) extensions. In
>> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>> EXTENDS
>> >>> clause. You can write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> to tell Calcite that there are two undeclared columns in the Emp table
>> >> but
>> >>> you would like to use them in this particular query. We chose to make
>> the
>> >>> EXTEND keyword optional, so you could instead write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> That is uncomfortably close to
>> >>>
>> >>>  SELECT *
>> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>
>> >>> so we would require
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>
>> >>> if EmpFunction was a table-function. You could combine the two forms
>> like
>> >>> this:
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>> >>> (anotherAttribute INTEGER);
>> >>>
>> >>> We could revisit whether EXTEND is optional, I suppose. But we should
>> >> also
>> >>> ask whether requiring folks to type TABLE is such a hardship.
>> >>>
>> >>> Julian
>> >>>
>> >>>
>> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> >>>>
>> >>>> - Table function syntax: I did a quick search and it seems there's no
>> >>>> consensus about this.
>> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>> >>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>> >>>> expect it.
>> >>>> MySQL does not have table functions [5]
>> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
>> >>>> Would it be reasonable to allow a switch in the grammar generation to
>> >>> have
>> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>> >>>> syntax (back ticks for identifiers etc)
>> >>>>
>> >>>> [1]
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>> [2]
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>> [5]
>> >>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>
>> >>>> - It seems a simple change in SqlCallBinding fixes the function
>> >>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>> But that seems too easy to be true. Possibly this method is called
>> more
>> >>>> than once (before and after the function has been resolved?)
>> >>>>
>> >>>> FYI this would happen only when using named parameter. We do want to
>> >>>> overload in this case, which is why I'm looking into it.
>> >>>>
>> >>>> I'll fill a JIRA for my other branch
>> >>>>
>> >>>> Julien
>> >>>>
>> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >>>>
>> >>>>>
>> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>> >>>>>
>> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
>> when
>> >>>>> there's more than 1 function with the same name.
>> >>>>>
>> >>>>>
>> >>>>> Yes; see below.
>> >>>>>
>> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>> >>> being
>> >>>>> able to specify the delimiter for csv files.
>> >>>>> So it seem the answer to my question 1) is that TableMacros are the
>> >> way
>> >>> to
>> >>>>> go.
>> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>> >>>>> necessary?*
>> >>>>>
>> >>>>>
>> >>>>> Consider:
>> >>>>>
>> >>>>> select * from myTable as f(x, y)
>> >>>>> select * from myTable f(x, y)
>> >>>>> select * from myFunction(x, y)
>> >>>>>
>> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> >> if
>> >>> f
>> >>>>> is a function with zero arguments, could you invoke it like this?:
>> >>>>>
>> >>>>> select * from f
>> >>>>>
>> >>>>> I don’t know the actual rationale. But I know that the SQL standards
>> >>>>> people in their wisdom decided to add a keyword to disambiguate.
>> >>>>>
>> >>>>> I had to fix some things in Calcite to enable this:
>> >>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> >>>>> Calcite for the Maze example.
>> >>>>> Which is why some hooks were missing.
>> >>>>>
>> >>>>>
>> >>>>> Can you log a jira case to track this bug?
>> >>>>>
>> >>>>>
>> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
>> >>>>> Here is a test that reproduces the problem:
>> >>>>> https://github.com/apache/calcite/pull/166
>> >>>>> If we return more than 1 TableFunction with the same name, we get a
>> >> NPE
>> >>>>> later on.
>> >>>>>
>> >>>>>
>> >>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> >> case
>> >>>>> on resolution of overloaded functions when invoked with named
>> >> arguments.
>> >>>>> (It probably applies to all functions, not just table functions.)
>> The
>> >>> fix
>> >>>>> will take a while (if you wait for me to write it).
>> >>>>>
>> >>>>> For now please tell your users not to overload. :)
>> >>>>>
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Julien
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > *Jim Scott*
>> > Director, Enterprise Strategy & Architecture
>> > +1 (347) 746-9281
>> > @kingmesal <https://twitter.com/kingmesal>
>> >
>> > <http://www.mapr.com/>
>> > [image: MapR Technologies] <http://www.mapr.com>
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>>
>

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

My proposal was an a or b using the freemarker template in the grammar, not
something later.

Actually, put another way: we may want to consider stating that we only
incorporate SQL standards in our primary grammar. Any extensions should be
optional grammar. We could simply have grammar plugins in Calcite (the same
way we plug in external things in Drill).

Trying to get every project to agree on extensions seems like it may be
hard.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:

> I can see why Jacques wants this syntax.
>
> However a “switch" in a grammar is a bad idea. Grammars need to be
> predictable. Any variation should happen at validation time, or later.
>
> Also, we shouldn’t add configuration parameters as a way of avoiding a
> tough design discussion.
>
> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
> which I mean Jacques and James, I guess) need to agree what the SQL syntax
> should be.
>
> Julian
>
>
> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >
> > Looking at those two examples I agree with Jacques. The first appears
> more
> > like a hint from the syntactic sugar point of view.
> >
> >
> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >
> >> Since EXTEND is custom functionality, it seems reasonable that we could
> >> have a switch. Given that SQL Server and Postgres support it seems
> >> reasonable to support the table functions without the TABLE syntax.
> >>
> >> I for one definitely think the TABLE syntax is much more confusing to
> use,
> >> especially in the example that we're looking to support, such as:
> >>
> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
> >> '|', skipFirstRow => true)
> >>
> >> This seems much clearer than:
> >>
> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> fieldDelimiter
> >> => '|', skipFirstRow => true))
> >>
> >> It also looks much more like a hint to the table (which is our goal).
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jacques Nadeau
> >> CTO and Co-Founder, Dremio
> >>
> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
> >>
> >>> Thanks for doing the legwork and finding what the other vendors do. It
> is
> >>> indeed compelling that SQL Server and Postgres go beyond the standard
> an
> >>> make the TABLE keyword optional.
> >>>
> >>> I tried that syntax in Calcite and discovered that there is a clash
> with
> >>> one of our own (few) extensions. In
> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> >>> clause. You can write
> >>>
> >>>  SELECT *
> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>  WHERE goldHandicap < 10;
> >>>
> >>> to tell Calcite that there are two undeclared columns in the Emp table
> >> but
> >>> you would like to use them in this particular query. We chose to make
> the
> >>> EXTEND keyword optional, so you could instead write
> >>>
> >>>  SELECT *
> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>  WHERE goldHandicap < 10;
> >>>
> >>> That is uncomfortably close to
> >>>
> >>>  SELECT *
> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> >>>
> >>> so we would require
> >>>
> >>>  SELECT *
> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>
> >>> if EmpFunction was a table-function. You could combine the two forms
> like
> >>> this:
> >>>
> >>>  SELECT *
> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>> (anotherAttribute INTEGER);
> >>>
> >>> We could revisit whether EXTEND is optional, I suppose. But we should
> >> also
> >>> ask whether requiring folks to type TABLE is such a hardship.
> >>>
> >>> Julian
> >>>
> >>>
> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>>>
> >>>> - Table function syntax: I did a quick search and it seems there's no
> >>>> consensus about this.
> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
> >>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> >>>> expect it.
> >>>> MySQL does not have table functions [5]
> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>> Would it be reasonable to allow a switch in the grammar generation to
> >>> have
> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
> >>>> syntax (back ticks for identifiers etc)
> >>>>
> >>>> [1]
> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>> [2]
> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>> [5]
> >>>>
> >>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>
> >>>> - It seems a simple change in SqlCallBinding fixes the function
> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> >>>> But that seems too easy to be true. Possibly this method is called
> more
> >>>> than once (before and after the function has been resolved?)
> >>>>
> >>>> FYI this would happen only when using named parameter. We do want to
> >>>> overload in this case, which is why I'm looking into it.
> >>>>
> >>>> I'll fill a JIRA for my other branch
> >>>>
> >>>> Julien
> >>>>
> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>>
> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>>>>
> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
> when
> >>>>> there's more than 1 function with the same name.
> >>>>>
> >>>>>
> >>>>> Yes; see below.
> >>>>>
> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
> >>> being
> >>>>> able to specify the delimiter for csv files.
> >>>>> So it seem the answer to my question 1) is that TableMacros are the
> >> way
> >>> to
> >>>>> go.
> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >>>>> necessary?*
> >>>>>
> >>>>>
> >>>>> Consider:
> >>>>>
> >>>>> select * from myTable as f(x, y)
> >>>>> select * from myTable f(x, y)
> >>>>> select * from myFunction(x, y)
> >>>>>
> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
> >> if
> >>> f
> >>>>> is a function with zero arguments, could you invoke it like this?:
> >>>>>
> >>>>> select * from f
> >>>>>
> >>>>> I don’t know the actual rationale. But I know that the SQL standards
> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> >>>>>
> >>>>> I had to fix some things in Calcite to enable this:
> >>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
> >>>>> Calcite for the Maze example.
> >>>>> Which is why some hooks were missing.
> >>>>>
> >>>>>
> >>>>> Can you log a jira case to track this bug?
> >>>>>
> >>>>>
> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >>>>> Here is a test that reproduces the problem:
> >>>>> https://github.com/apache/calcite/pull/166
> >>>>> If we return more than 1 TableFunction with the same name, we get a
> >> NPE
> >>>>> later on.
> >>>>>
> >>>>>
> >>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
> >> case
> >>>>> on resolution of overloaded functions when invoked with named
> >> arguments.
> >>>>> (It probably applies to all functions, not just table functions.) The
> >>> fix
> >>>>> will take a while (if you wait for me to write it).
> >>>>>
> >>>>> For now please tell your users not to overload. :)
> >>>>>
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Julien
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > *Jim Scott*
> > Director, Enterprise Strategy & Architecture
> > +1 (347) 746-9281
> > @kingmesal <https://twitter.com/kingmesal>
> >
> > <http://www.mapr.com/>
> > [image: MapR Technologies] <http://www.mapr.com>
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>
>

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

I can see why Jacques wants this syntax.

However a “switch" in a grammar is a bad idea. Grammars need to be predictable. Any variation should happen at validation time, or later.

Also, we shouldn’t add configuration parameters as a way of avoiding a tough design discussion. 

EXTENDS and eliding TABLE are both extensions to standard SQL, and they are both applicable to Drill and Phoenix. I think Drill and Phoenix (by which I mean Jacques and James, I guess) need to agree what the SQL syntax should be.

Julian


> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> 
> Looking at those two examples I agree with Jacques. The first appears more
> like a hint from the syntactic sugar point of view.
> 
> 
> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:
> 
>> Since EXTEND is custom functionality, it seems reasonable that we could
>> have a switch. Given that SQL Server and Postgres support it seems
>> reasonable to support the table functions without the TABLE syntax.
>> 
>> I for one definitely think the TABLE syntax is much more confusing to use,
>> especially in the example that we're looking to support, such as:
>> 
>> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> '|', skipFirstRow => true)
>> 
>> This seems much clearer than:
>> 
>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
>> => '|', skipFirstRow => true))
>> 
>> It also looks much more like a hint to the table (which is our goal).
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>> 
>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>>> Thanks for doing the legwork and finding what the other vendors do. It is
>>> indeed compelling that SQL Server and Postgres go beyond the standard an
>>> make the TABLE keyword optional.
>>> 
>>> I tried that syntax in Calcite and discovered that there is a clash with
>>> one of our own (few) extensions. In
>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
>>> clause. You can write
>>> 
>>>  SELECT *
>>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> to tell Calcite that there are two undeclared columns in the Emp table
>> but
>>> you would like to use them in this particular query. We chose to make the
>>> EXTEND keyword optional, so you could instead write
>>> 
>>>  SELECT *
>>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> That is uncomfortably close to
>>> 
>>>  SELECT *
>>>  FROM EmpFunction (favoriteBand, golfHandicap);
>>> 
>>> so we would require
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>> 
>>> if EmpFunction was a table-function. You could combine the two forms like
>>> this:
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>> (anotherAttribute INTEGER);
>>> 
>>> We could revisit whether EXTEND is optional, I suppose. But we should
>> also
>>> ask whether requiring folks to type TABLE is such a hardship.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>> 
>>>> - Table function syntax: I did a quick search and it seems there's no
>>>> consensus about this.
>>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>>>> expect it.
>>>> MySQL does not have table functions [5]
>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>> Would it be reasonable to allow a switch in the grammar generation to
>>> have
>>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>>>> syntax (back ticks for identifiers etc)
>>>> 
>>>> [1]
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>> [2]
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>> [5]
>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>> 
>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>> But that seems too easy to be true. Possibly this method is called more
>>>> than once (before and after the function has been resolved?)
>>>> 
>>>> FYI this would happen only when using named parameter. We do want to
>>>> overload in this case, which is why I'm looking into it.
>>>> 
>>>> I'll fill a JIRA for my other branch
>>>> 
>>>> Julien
>>>> 
>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>> 
>>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>>>>> there's more than 1 function with the same name.
>>>>> 
>>>>> 
>>>>> Yes; see below.
>>>>> 
>>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>>> being
>>>>> able to specify the delimiter for csv files.
>>>>> So it seem the answer to my question 1) is that TableMacros are the
>> way
>>> to
>>>>> go.
>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>> necessary?*
>>>>> 
>>>>> 
>>>>> Consider:
>>>>> 
>>>>> select * from myTable as f(x, y)
>>>>> select * from myTable f(x, y)
>>>>> select * from myFunction(x, y)
>>>>> 
>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> if
>>> f
>>>>> is a function with zero arguments, could you invoke it like this?:
>>>>> 
>>>>> select * from f
>>>>> 
>>>>> I don’t know the actual rationale. But I know that the SQL standards
>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>> 
>>>>> I had to fix some things in Calcite to enable this:
>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>>>>> Calcite for the Maze example.
>>>>> Which is why some hooks were missing.
>>>>> 
>>>>> 
>>>>> Can you log a jira case to track this bug?
>>>>> 
>>>>> 
>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>> Here is a test that reproduces the problem:
>>>>> https://github.com/apache/calcite/pull/166
>>>>> If we return more than 1 TableFunction with the same name, we get a
>> NPE
>>>>> later on.
>>>>> 
>>>>> 
>>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> case
>>>>> on resolution of overloaded functions when invoked with named
>> arguments.
>>>>> (It probably applies to all functions, not just table functions.) The
>>> fix
>>>>> will take a while (if you wait for me to write it).
>>>>> 
>>>>> For now please tell your users not to overload. :)
>>>>> 
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Julien
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
> 
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
> 
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

I can see why Jacques wants this syntax.

However a “switch" in a grammar is a bad idea. Grammars need to be predictable. Any variation should happen at validation time, or later.

Also, we shouldn’t add configuration parameters as a way of avoiding a tough design discussion. 

EXTENDS and eliding TABLE are both extensions to standard SQL, and they are both applicable to Drill and Phoenix. I think Drill and Phoenix (by which I mean Jacques and James, I guess) need to agree what the SQL syntax should be.

Julian


> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> 
> Looking at those two examples I agree with Jacques. The first appears more
> like a hint from the syntactic sugar point of view.
> 
> 
> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:
> 
>> Since EXTEND is custom functionality, it seems reasonable that we could
>> have a switch. Given that SQL Server and Postgres support it seems
>> reasonable to support the table functions without the TABLE syntax.
>> 
>> I for one definitely think the TABLE syntax is much more confusing to use,
>> especially in the example that we're looking to support, such as:
>> 
>> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> '|', skipFirstRow => true)
>> 
>> This seems much clearer than:
>> 
>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
>> => '|', skipFirstRow => true))
>> 
>> It also looks much more like a hint to the table (which is our goal).
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>> 
>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>>> Thanks for doing the legwork and finding what the other vendors do. It is
>>> indeed compelling that SQL Server and Postgres go beyond the standard an
>>> make the TABLE keyword optional.
>>> 
>>> I tried that syntax in Calcite and discovered that there is a clash with
>>> one of our own (few) extensions. In
>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
>>> clause. You can write
>>> 
>>>  SELECT *
>>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> to tell Calcite that there are two undeclared columns in the Emp table
>> but
>>> you would like to use them in this particular query. We chose to make the
>>> EXTEND keyword optional, so you could instead write
>>> 
>>>  SELECT *
>>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> That is uncomfortably close to
>>> 
>>>  SELECT *
>>>  FROM EmpFunction (favoriteBand, golfHandicap);
>>> 
>>> so we would require
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>> 
>>> if EmpFunction was a table-function. You could combine the two forms like
>>> this:
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>> (anotherAttribute INTEGER);
>>> 
>>> We could revisit whether EXTEND is optional, I suppose. But we should
>> also
>>> ask whether requiring folks to type TABLE is such a hardship.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>> 
>>>> - Table function syntax: I did a quick search and it seems there's no
>>>> consensus about this.
>>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>>>> expect it.
>>>> MySQL does not have table functions [5]
>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>> Would it be reasonable to allow a switch in the grammar generation to
>>> have
>>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>>>> syntax (back ticks for identifiers etc)
>>>> 
>>>> [1]
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>> [2]
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>> [5]
>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>> 
>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>> But that seems too easy to be true. Possibly this method is called more
>>>> than once (before and after the function has been resolved?)
>>>> 
>>>> FYI this would happen only when using named parameter. We do want to
>>>> overload in this case, which is why I'm looking into it.
>>>> 
>>>> I'll fill a JIRA for my other branch
>>>> 
>>>> Julien
>>>> 
>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>> 
>>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>>>>> there's more than 1 function with the same name.
>>>>> 
>>>>> 
>>>>> Yes; see below.
>>>>> 
>>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>>> being
>>>>> able to specify the delimiter for csv files.
>>>>> So it seem the answer to my question 1) is that TableMacros are the
>> way
>>> to
>>>>> go.
>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>> necessary?*
>>>>> 
>>>>> 
>>>>> Consider:
>>>>> 
>>>>> select * from myTable as f(x, y)
>>>>> select * from myTable f(x, y)
>>>>> select * from myFunction(x, y)
>>>>> 
>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> if
>>> f
>>>>> is a function with zero arguments, could you invoke it like this?:
>>>>> 
>>>>> select * from f
>>>>> 
>>>>> I don’t know the actual rationale. But I know that the SQL standards
>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>> 
>>>>> I had to fix some things in Calcite to enable this:
>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>>>>> Calcite for the Maze example.
>>>>> Which is why some hooks were missing.
>>>>> 
>>>>> 
>>>>> Can you log a jira case to track this bug?
>>>>> 
>>>>> 
>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>> Here is a test that reproduces the problem:
>>>>> https://github.com/apache/calcite/pull/166
>>>>> If we return more than 1 TableFunction with the same name, we get a
>> NPE
>>>>> later on.
>>>>> 
>>>>> 
>>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> case
>>>>> on resolution of overloaded functions when invoked with named
>> arguments.
>>>>> (It probably applies to all functions, not just table functions.) The
>>> fix
>>>>> will take a while (if you wait for me to write it).
>>>>> 
>>>>> For now please tell your users not to overload. :)
>>>>> 
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Julien
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
> 
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
> 
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Jim Scott <js...@maprtech.com>.

Looking at those two examples I agree with Jacques. The first appears more
like a hint from the syntactic sugar point of view.


On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Since EXTEND is custom functionality, it seems reasonable that we could
> have a switch. Given that SQL Server and Postgres support it seems
> reasonable to support the table functions without the TABLE syntax.
>
> I for one definitely think the TABLE syntax is much more confusing to use,
> especially in the example that we're looking to support, such as:
>
> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
> '|', skipFirstRow => true)
>
> This seems much clearer than:
>
> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> => '|', skipFirstRow => true))
>
> It also looks much more like a hint to the table (which is our goal).
>
>
>
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > Thanks for doing the legwork and finding what the other vendors do. It is
> > indeed compelling that SQL Server and Postgres go beyond the standard an
> > make the TABLE keyword optional.
> >
> > I tried that syntax in Calcite and discovered that there is a clash with
> > one of our own (few) extensions. In
> > https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> > clause. You can write
> >
> >   SELECT *
> >   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > to tell Calcite that there are two undeclared columns in the Emp table
> but
> > you would like to use them in this particular query. We chose to make the
> > EXTEND keyword optional, so you could instead write
> >
> >   SELECT *
> >   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > That is uncomfortably close to
> >
> >   SELECT *
> >   FROM EmpFunction (favoriteBand, golfHandicap);
> >
> > so we would require
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >
> > if EmpFunction was a table-function. You could combine the two forms like
> > this:
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > (anotherAttribute INTEGER);
> >
> > We could revisit whether EXTEND is optional, I suppose. But we should
> also
> > ask whether requiring folks to type TABLE is such a hardship.
> >
> > Julian
> >
> >
> > > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >
> > > - Table function syntax: I did a quick search and it seems there's no
> > > consensus about this.
> > > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > > expect it.
> > > MySQL does not have table functions [5]
> > > 2 for, 2 against and 1 undecided: that's a draw :)
> > > Would it be reasonable to allow a switch in the grammar generation to
> > have
> > > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > > syntax (back ticks for identifiers etc)
> > >
> > > [1]
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > > [2]
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > > [5]
> > >
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >
> > > - It seems a simple change in SqlCallBinding fixes the function
> > > overloading: https://github.com/apache/calcite/pull/166/files
> > > But that seems too easy to be true. Possibly this method is called more
> > > than once (before and after the function has been resolved?)
> > >
> > > FYI this would happen only when using named parameter. We do want to
> > > overload in this case, which is why I'm looking into it.
> > >
> > > I'll fill a JIRA for my other branch
> > >
> > > Julien
> > >
> > > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >>
> > >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >>
> > >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> > >> there's more than 1 function with the same name.
> > >>
> > >>
> > >> Yes; see below.
> > >>
> > >> FYI: I have a prototype of TableMacro working in Drill. For now just
> > being
> > >> able to specify the delimiter for csv files.
> > >> So it seem the answer to my question 1) is that TableMacros are the
> way
> > to
> > >> go.
> > >> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> necessary?*
> > >>
> > >>
> > >> Consider:
> > >>
> > >> select * from myTable as f(x, y)
> > >> select * from myTable f(x, y)
> > >> select * from myFunction(x, y)
> > >>
> > >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
> if
> > f
> > >> is a function with zero arguments, could you invoke it like this?:
> > >>
> > >> select * from f
> > >>
> > >> I don’t know the actual rationale. But I know that the SQL standards
> > >> people in their wisdom decided to add a keyword to disambiguate.
> > >>
> > >> I had to fix some things in Calcite to enable this:
> > >> https://github.com/dremio/calcite/pull/1/files
> > >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> > >> Calcite for the Maze example.
> > >> Which is why some hooks were missing.
> > >>
> > >>
> > >> Can you log a jira case to track this bug?
> > >>
> > >>
> > >> I think I found a bug in Calcite but I'd need help to fix it.
> > >> Here is a test that reproduces the problem:
> > >> https://github.com/apache/calcite/pull/166
> > >> If we return more than 1 TableFunction with the same name, we get a
> NPE
> > >> later on.
> > >>
> > >>
> > >> Yes, I knew there was a problem with overloading. Please log a JIRA
> case
> > >> on resolution of overloaded functions when invoked with named
> arguments.
> > >> (It probably applies to all functions, not just table functions.) The
> > fix
> > >> will take a while (if you wait for me to write it).
> > >>
> > >> For now please tell your users not to overload. :)
> > >>
> > >>
> > >> Julian
> > >>
> > >>
> > >
> > >
> > > --
> > > Julien
> >
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Jim Scott <js...@maprtech.com>.

Looking at those two examples I agree with Jacques. The first appears more
like a hint from the syntactic sugar point of view.


On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Since EXTEND is custom functionality, it seems reasonable that we could
> have a switch. Given that SQL Server and Postgres support it seems
> reasonable to support the table functions without the TABLE syntax.
>
> I for one definitely think the TABLE syntax is much more confusing to use,
> especially in the example that we're looking to support, such as:
>
> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
> '|', skipFirstRow => true)
>
> This seems much clearer than:
>
> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> => '|', skipFirstRow => true))
>
> It also looks much more like a hint to the table (which is our goal).
>
>
>
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > Thanks for doing the legwork and finding what the other vendors do. It is
> > indeed compelling that SQL Server and Postgres go beyond the standard an
> > make the TABLE keyword optional.
> >
> > I tried that syntax in Calcite and discovered that there is a clash with
> > one of our own (few) extensions. In
> > https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> > clause. You can write
> >
> >   SELECT *
> >   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > to tell Calcite that there are two undeclared columns in the Emp table
> but
> > you would like to use them in this particular query. We chose to make the
> > EXTEND keyword optional, so you could instead write
> >
> >   SELECT *
> >   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > That is uncomfortably close to
> >
> >   SELECT *
> >   FROM EmpFunction (favoriteBand, golfHandicap);
> >
> > so we would require
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >
> > if EmpFunction was a table-function. You could combine the two forms like
> > this:
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > (anotherAttribute INTEGER);
> >
> > We could revisit whether EXTEND is optional, I suppose. But we should
> also
> > ask whether requiring folks to type TABLE is such a hardship.
> >
> > Julian
> >
> >
> > > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >
> > > - Table function syntax: I did a quick search and it seems there's no
> > > consensus about this.
> > > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > > expect it.
> > > MySQL does not have table functions [5]
> > > 2 for, 2 against and 1 undecided: that's a draw :)
> > > Would it be reasonable to allow a switch in the grammar generation to
> > have
> > > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > > syntax (back ticks for identifiers etc)
> > >
> > > [1]
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > > [2]
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > > [5]
> > >
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >
> > > - It seems a simple change in SqlCallBinding fixes the function
> > > overloading: https://github.com/apache/calcite/pull/166/files
> > > But that seems too easy to be true. Possibly this method is called more
> > > than once (before and after the function has been resolved?)
> > >
> > > FYI this would happen only when using named parameter. We do want to
> > > overload in this case, which is why I'm looking into it.
> > >
> > > I'll fill a JIRA for my other branch
> > >
> > > Julien
> > >
> > > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >>
> > >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >>
> > >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> > >> there's more than 1 function with the same name.
> > >>
> > >>
> > >> Yes; see below.
> > >>
> > >> FYI: I have a prototype of TableMacro working in Drill. For now just
> > being
> > >> able to specify the delimiter for csv files.
> > >> So it seem the answer to my question 1) is that TableMacros are the
> way
> > to
> > >> go.
> > >> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> necessary?*
> > >>
> > >>
> > >> Consider:
> > >>
> > >> select * from myTable as f(x, y)
> > >> select * from myTable f(x, y)
> > >> select * from myFunction(x, y)
> > >>
> > >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
> if
> > f
> > >> is a function with zero arguments, could you invoke it like this?:
> > >>
> > >> select * from f
> > >>
> > >> I don’t know the actual rationale. But I know that the SQL standards
> > >> people in their wisdom decided to add a keyword to disambiguate.
> > >>
> > >> I had to fix some things in Calcite to enable this:
> > >> https://github.com/dremio/calcite/pull/1/files
> > >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> > >> Calcite for the Maze example.
> > >> Which is why some hooks were missing.
> > >>
> > >>
> > >> Can you log a jira case to track this bug?
> > >>
> > >>
> > >> I think I found a bug in Calcite but I'd need help to fix it.
> > >> Here is a test that reproduces the problem:
> > >> https://github.com/apache/calcite/pull/166
> > >> If we return more than 1 TableFunction with the same name, we get a
> NPE
> > >> later on.
> > >>
> > >>
> > >> Yes, I knew there was a problem with overloading. Please log a JIRA
> case
> > >> on resolution of overloaded functions when invoked with named
> arguments.
> > >> (It probably applies to all functions, not just table functions.) The
> > fix
> > >> will take a while (if you wait for me to write it).
> > >>
> > >> For now please tell your users not to overload. :)
> > >>
> > >>
> > >> Julian
> > >>
> > >>
> > >
> > >
> > > --
> > > Julien
> >
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

Since EXTEND is custom functionality, it seems reasonable that we could
have a switch. Given that SQL Server and Postgres support it seems
reasonable to support the table functions without the TABLE syntax.

I for one definitely think the TABLE syntax is much more confusing to use,
especially in the example that we're looking to support, such as:

select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
'|', skipFirstRow => true)

This seems much clearer than:

select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
=> '|', skipFirstRow => true))

It also looks much more like a hint to the table (which is our goal).







--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:

> Thanks for doing the legwork and finding what the other vendors do. It is
> indeed compelling that SQL Server and Postgres go beyond the standard an
> make the TABLE keyword optional.
>
> I tried that syntax in Calcite and discovered that there is a clash with
> one of our own (few) extensions. In
> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> clause. You can write
>
>   SELECT *
>   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> to tell Calcite that there are two undeclared columns in the Emp table but
> you would like to use them in this particular query. We chose to make the
> EXTEND keyword optional, so you could instead write
>
>   SELECT *
>   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> That is uncomfortably close to
>
>   SELECT *
>   FROM EmpFunction (favoriteBand, golfHandicap);
>
> so we would require
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>
> if EmpFunction was a table-function. You could combine the two forms like
> this:
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> (anotherAttribute INTEGER);
>
> We could revisit whether EXTEND is optional, I suppose. But we should also
> ask whether requiring folks to type TABLE is such a hardship.
>
> Julian
>
>
> > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > - Table function syntax: I did a quick search and it seems there's no
> > consensus about this.
> > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > expect it.
> > MySQL does not have table functions [5]
> > 2 for, 2 against and 1 undecided: that's a draw :)
> > Would it be reasonable to allow a switch in the grammar generation to
> have
> > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > syntax (back ticks for identifiers etc)
> >
> > [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > [5]
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >
> > - It seems a simple change in SqlCallBinding fixes the function
> > overloading: https://github.com/apache/calcite/pull/166/files
> > But that seems too easy to be true. Possibly this method is called more
> > than once (before and after the function has been resolved?)
> >
> > FYI this would happen only when using named parameter. We do want to
> > overload in this case, which is why I'm looking into it.
> >
> > I'll fill a JIRA for my other branch
> >
> > Julien
> >
> > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >>
> >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>
> >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> >> there's more than 1 function with the same name.
> >>
> >>
> >> Yes; see below.
> >>
> >> FYI: I have a prototype of TableMacro working in Drill. For now just
> being
> >> able to specify the delimiter for csv files.
> >> So it seem the answer to my question 1) is that TableMacros are the way
> to
> >> go.
> >> I'm still wondering about *3) is the table(...) wrapping syntax
> >> necessary?*
> >>
> >>
> >> Consider:
> >>
> >> select * from myTable as f(x, y)
> >> select * from myTable f(x, y)
> >> select * from myFunction(x, y)
> >>
> >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if
> f
> >> is a function with zero arguments, could you invoke it like this?:
> >>
> >> select * from f
> >>
> >> I don’t know the actual rationale. But I know that the SQL standards
> >> people in their wisdom decided to add a keyword to disambiguate.
> >>
> >> I had to fix some things in Calcite to enable this:
> >> https://github.com/dremio/calcite/pull/1/files
> >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> >> Calcite for the Maze example.
> >> Which is why some hooks were missing.
> >>
> >>
> >> Can you log a jira case to track this bug?
> >>
> >>
> >> I think I found a bug in Calcite but I'd need help to fix it.
> >> Here is a test that reproduces the problem:
> >> https://github.com/apache/calcite/pull/166
> >> If we return more than 1 TableFunction with the same name, we get a NPE
> >> later on.
> >>
> >>
> >> Yes, I knew there was a problem with overloading. Please log a JIRA case
> >> on resolution of overloaded functions when invoked with named arguments.
> >> (It probably applies to all functions, not just table functions.) The
> fix
> >> will take a while (if you wait for me to write it).
> >>
> >> For now please tell your users not to overload. :)
> >>
> >>
> >> Julian
> >>
> >>
> >
> >
> > --
> > Julien
>
>

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

Since EXTEND is custom functionality, it seems reasonable that we could
have a switch. Given that SQL Server and Postgres support it seems
reasonable to support the table functions without the TABLE syntax.

I for one definitely think the TABLE syntax is much more confusing to use,
especially in the example that we're looking to support, such as:

select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
'|', skipFirstRow => true)

This seems much clearer than:

select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
=> '|', skipFirstRow => true))

It also looks much more like a hint to the table (which is our goal).







--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:

> Thanks for doing the legwork and finding what the other vendors do. It is
> indeed compelling that SQL Server and Postgres go beyond the standard an
> make the TABLE keyword optional.
>
> I tried that syntax in Calcite and discovered that there is a clash with
> one of our own (few) extensions. In
> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> clause. You can write
>
>   SELECT *
>   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> to tell Calcite that there are two undeclared columns in the Emp table but
> you would like to use them in this particular query. We chose to make the
> EXTEND keyword optional, so you could instead write
>
>   SELECT *
>   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> That is uncomfortably close to
>
>   SELECT *
>   FROM EmpFunction (favoriteBand, golfHandicap);
>
> so we would require
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>
> if EmpFunction was a table-function. You could combine the two forms like
> this:
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> (anotherAttribute INTEGER);
>
> We could revisit whether EXTEND is optional, I suppose. But we should also
> ask whether requiring folks to type TABLE is such a hardship.
>
> Julian
>
>
> > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > - Table function syntax: I did a quick search and it seems there's no
> > consensus about this.
> > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > expect it.
> > MySQL does not have table functions [5]
> > 2 for, 2 against and 1 undecided: that's a draw :)
> > Would it be reasonable to allow a switch in the grammar generation to
> have
> > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > syntax (back ticks for identifiers etc)
> >
> > [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > [5]
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >
> > - It seems a simple change in SqlCallBinding fixes the function
> > overloading: https://github.com/apache/calcite/pull/166/files
> > But that seems too easy to be true. Possibly this method is called more
> > than once (before and after the function has been resolved?)
> >
> > FYI this would happen only when using named parameter. We do want to
> > overload in this case, which is why I'm looking into it.
> >
> > I'll fill a JIRA for my other branch
> >
> > Julien
> >
> > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >>
> >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>
> >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> >> there's more than 1 function with the same name.
> >>
> >>
> >> Yes; see below.
> >>
> >> FYI: I have a prototype of TableMacro working in Drill. For now just
> being
> >> able to specify the delimiter for csv files.
> >> So it seem the answer to my question 1) is that TableMacros are the way
> to
> >> go.
> >> I'm still wondering about *3) is the table(...) wrapping syntax
> >> necessary?*
> >>
> >>
> >> Consider:
> >>
> >> select * from myTable as f(x, y)
> >> select * from myTable f(x, y)
> >> select * from myFunction(x, y)
> >>
> >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if
> f
> >> is a function with zero arguments, could you invoke it like this?:
> >>
> >> select * from f
> >>
> >> I don’t know the actual rationale. But I know that the SQL standards
> >> people in their wisdom decided to add a keyword to disambiguate.
> >>
> >> I had to fix some things in Calcite to enable this:
> >> https://github.com/dremio/calcite/pull/1/files
> >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> >> Calcite for the Maze example.
> >> Which is why some hooks were missing.
> >>
> >>
> >> Can you log a jira case to track this bug?
> >>
> >>
> >> I think I found a bug in Calcite but I'd need help to fix it.
> >> Here is a test that reproduces the problem:
> >> https://github.com/apache/calcite/pull/166
> >> If we return more than 1 TableFunction with the same name, we get a NPE
> >> later on.
> >>
> >>
> >> Yes, I knew there was a problem with overloading. Please log a JIRA case
> >> on resolution of overloaded functions when invoked with named arguments.
> >> (It probably applies to all functions, not just table functions.) The
> fix
> >> will take a while (if you wait for me to write it).
> >>
> >> For now please tell your users not to overload. :)
> >>
> >>
> >> Julian
> >>
> >>
> >
> >
> > --
> > Julien
>
>

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks for doing the legwork and finding what the other vendors do. It is indeed compelling that SQL Server and Postgres go beyond the standard an make the TABLE keyword optional.

I tried that syntax in Calcite and discovered that there is a clash with one of our own (few) extensions. In https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS clause. You can write

  SELECT *
  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

to tell Calcite that there are two undeclared columns in the Emp table but you would like to use them in this particular query. We chose to make the EXTEND keyword optional, so you could instead write

  SELECT *
  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

That is uncomfortably close to

  SELECT *
  FROM EmpFunction (favoriteBand, golfHandicap);

so we would require

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));

if EmpFunction was a table-function. You could combine the two forms like this:

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND (anotherAttribute INTEGER);

We could revisit whether EXTEND is optional, I suppose. But we should also ask whether requiring folks to type TABLE is such a hardship.

Julian


> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> - Table function syntax: I did a quick search and it seems there's no
> consensus about this.
> It seems that Posgres [1] and SQL Server [2] both allow calling table
> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> expect it.
> MySQL does not have table functions [5]
> 2 for, 2 against and 1 undecided: that's a draw :)
> Would it be reasonable to allow a switch in the grammar generation to have
> a posgres compatible syntax? Currently in Drill we use the MySQL like
> syntax (back ticks for identifiers etc)
> 
> [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> [5]
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> 
> - It seems a simple change in SqlCallBinding fixes the function
> overloading: https://github.com/apache/calcite/pull/166/files
> But that seems too easy to be true. Possibly this method is called more
> than once (before and after the function has been resolved?)
> 
> FYI this would happen only when using named parameter. We do want to
> overload in this case, which is why I'm looking into it.
> 
> I'll fill a JIRA for my other branch
> 
> Julien
> 
> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> 
>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> 
>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>> there's more than 1 function with the same name.
>> 
>> 
>> Yes; see below.
>> 
>> FYI: I have a prototype of TableMacro working in Drill. For now just being
>> able to specify the delimiter for csv files.
>> So it seem the answer to my question 1) is that TableMacros are the way to
>> go.
>> I'm still wondering about *3) is the table(...) wrapping syntax
>> necessary?*
>> 
>> 
>> Consider:
>> 
>> select * from myTable as f(x, y)
>> select * from myTable f(x, y)
>> select * from myFunction(x, y)
>> 
>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
>> is a function with zero arguments, could you invoke it like this?:
>> 
>> select * from f
>> 
>> I don’t know the actual rationale. But I know that the SQL standards
>> people in their wisdom decided to add a keyword to disambiguate.
>> 
>> I had to fix some things in Calcite to enable this:
>> https://github.com/dremio/calcite/pull/1/files
>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> Calcite for the Maze example.
>> Which is why some hooks were missing.
>> 
>> 
>> Can you log a jira case to track this bug?
>> 
>> 
>> I think I found a bug in Calcite but I'd need help to fix it.
>> Here is a test that reproduces the problem:
>> https://github.com/apache/calcite/pull/166
>> If we return more than 1 TableFunction with the same name, we get a NPE
>> later on.
>> 
>> 
>> Yes, I knew there was a problem with overloading. Please log a JIRA case
>> on resolution of overloaded functions when invoked with named arguments.
>> (It probably applies to all functions, not just table functions.) The fix
>> will take a while (if you wait for me to write it).
>> 
>> For now please tell your users not to overload. :)
>> 
>> 
>> Julian
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks for doing the legwork and finding what the other vendors do. It is indeed compelling that SQL Server and Postgres go beyond the standard an make the TABLE keyword optional.

I tried that syntax in Calcite and discovered that there is a clash with one of our own (few) extensions. In https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS clause. You can write

  SELECT *
  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

to tell Calcite that there are two undeclared columns in the Emp table but you would like to use them in this particular query. We chose to make the EXTEND keyword optional, so you could instead write

  SELECT *
  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

That is uncomfortably close to

  SELECT *
  FROM EmpFunction (favoriteBand, golfHandicap);

so we would require

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));

if EmpFunction was a table-function. You could combine the two forms like this:

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND (anotherAttribute INTEGER);

We could revisit whether EXTEND is optional, I suppose. But we should also ask whether requiring folks to type TABLE is such a hardship.

Julian


> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> - Table function syntax: I did a quick search and it seems there's no
> consensus about this.
> It seems that Posgres [1] and SQL Server [2] both allow calling table
> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> expect it.
> MySQL does not have table functions [5]
> 2 for, 2 against and 1 undecided: that's a draw :)
> Would it be reasonable to allow a switch in the grammar generation to have
> a posgres compatible syntax? Currently in Drill we use the MySQL like
> syntax (back ticks for identifiers etc)
> 
> [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> [5]
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> 
> - It seems a simple change in SqlCallBinding fixes the function
> overloading: https://github.com/apache/calcite/pull/166/files
> But that seems too easy to be true. Possibly this method is called more
> than once (before and after the function has been resolved?)
> 
> FYI this would happen only when using named parameter. We do want to
> overload in this case, which is why I'm looking into it.
> 
> I'll fill a JIRA for my other branch
> 
> Julien
> 
> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> 
>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> 
>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>> there's more than 1 function with the same name.
>> 
>> 
>> Yes; see below.
>> 
>> FYI: I have a prototype of TableMacro working in Drill. For now just being
>> able to specify the delimiter for csv files.
>> So it seem the answer to my question 1) is that TableMacros are the way to
>> go.
>> I'm still wondering about *3) is the table(...) wrapping syntax
>> necessary?*
>> 
>> 
>> Consider:
>> 
>> select * from myTable as f(x, y)
>> select * from myTable f(x, y)
>> select * from myFunction(x, y)
>> 
>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
>> is a function with zero arguments, could you invoke it like this?:
>> 
>> select * from f
>> 
>> I don’t know the actual rationale. But I know that the SQL standards
>> people in their wisdom decided to add a keyword to disambiguate.
>> 
>> I had to fix some things in Calcite to enable this:
>> https://github.com/dremio/calcite/pull/1/files
>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> Calcite for the Maze example.
>> Which is why some hooks were missing.
>> 
>> 
>> Can you log a jira case to track this bug?
>> 
>> 
>> I think I found a bug in Calcite but I'd need help to fix it.
>> Here is a test that reproduces the problem:
>> https://github.com/apache/calcite/pull/166
>> If we return more than 1 TableFunction with the same name, we get a NPE
>> later on.
>> 
>> 
>> Yes, I knew there was a problem with overloading. Please log a JIRA case
>> on resolution of overloaded functions when invoked with named arguments.
>> (It probably applies to all functions, not just table functions.) The fix
>> will take a while (if you wait for me to write it).
>> 
>> For now please tell your users not to overload. :)
>> 
>> 
>> Julian
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

- Table function syntax: I did a quick search and it seems there's no
consensus about this.
It seems that Posgres [1] and SQL Server [2] both allow calling table
functions without the table(...) wrapper while Oracle [3] and DB2 [4]
expect it.
MySQL does not have table functions [5]
2 for, 2 against and 1 undecided: that's a draw :)
Would it be reasonable to allow a switch in the grammar generation to have
a posgres compatible syntax? Currently in Drill we use the MySQL like
syntax (back ticks for identifiers etc)

[1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
[2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
[3] https://oracle-base.com/articles/misc/pipelined-table-functions
[4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
[5]
http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table

- It seems a simple change in SqlCallBinding fixes the function
overloading: https://github.com/apache/calcite/pull/166/files
But that seems too easy to be true. Possibly this method is called more
than once (before and after the function has been resolved?)

FYI this would happen only when using named parameter. We do want to
overload in this case, which is why I'm looking into it.

I'll fill a JIRA for my other branch

Julien

On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:

>
> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> there's more than 1 function with the same name.
>
>
> Yes; see below.
>
> FYI: I have a prototype of TableMacro working in Drill. For now just being
> able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to
> go.
> I'm still wondering about *3) is the table(...) wrapping syntax
> necessary?*
>
>
> Consider:
>
> select * from myTable as f(x, y)
> select * from myTable f(x, y)
> select * from myFunction(x, y)
>
> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
> is a function with zero arguments, could you invoke it like this?:
>
> select * from f
>
> I don’t know the actual rationale. But I know that the SQL standards
> people in their wisdom decided to add a keyword to disambiguate.
>
> I had to fix some things in Calcite to enable this:
> https://github.com/dremio/calcite/pull/1/files
> Drill uses Frameworks.getPlanner() that does not seem to be used in
> Calcite for the Maze example.
> Which is why some hooks were missing.
>
>
> Can you log a jira case to track this bug?
>
>
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166
> If we return more than 1 TableFunction with the same name, we get a NPE
> later on.
>
>
> Yes, I knew there was a problem with overloading. Please log a JIRA case
> on resolution of overloaded functions when invoked with named arguments.
> (It probably applies to all functions, not just table functions.) The fix
> will take a while (if you wait for me to write it).
>
> For now please tell your users not to overload. :)
>
>
> Julian
>
>


-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

- Table function syntax: I did a quick search and it seems there's no
consensus about this.
It seems that Posgres [1] and SQL Server [2] both allow calling table
functions without the table(...) wrapper while Oracle [3] and DB2 [4]
expect it.
MySQL does not have table functions [5]
2 for, 2 against and 1 undecided: that's a draw :)
Would it be reasonable to allow a switch in the grammar generation to have
a posgres compatible syntax? Currently in Drill we use the MySQL like
syntax (back ticks for identifiers etc)

[1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
[2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
[3] https://oracle-base.com/articles/misc/pipelined-table-functions
[4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
[5]
http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table

- It seems a simple change in SqlCallBinding fixes the function
overloading: https://github.com/apache/calcite/pull/166/files
But that seems too easy to be true. Possibly this method is called more
than once (before and after the function has been resolved?)

FYI this would happen only when using named parameter. We do want to
overload in this case, which is why I'm looking into it.

I'll fill a JIRA for my other branch

Julien

On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:

>
> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> there's more than 1 function with the same name.
>
>
> Yes; see below.
>
> FYI: I have a prototype of TableMacro working in Drill. For now just being
> able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to
> go.
> I'm still wondering about *3) is the table(...) wrapping syntax
> necessary?*
>
>
> Consider:
>
> select * from myTable as f(x, y)
> select * from myTable f(x, y)
> select * from myFunction(x, y)
>
> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
> is a function with zero arguments, could you invoke it like this?:
>
> select * from f
>
> I don’t know the actual rationale. But I know that the SQL standards
> people in their wisdom decided to add a keyword to disambiguate.
>
> I had to fix some things in Calcite to enable this:
> https://github.com/dremio/calcite/pull/1/files
> Drill uses Frameworks.getPlanner() that does not seem to be used in
> Calcite for the Maze example.
> Which is why some hooks were missing.
>
>
> Can you log a jira case to track this bug?
>
>
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166
> If we return more than 1 TableFunction with the same name, we get a NPE
> later on.
>
>
> Yes, I knew there was a problem with overloading. Please log a JIRA case
> on resolution of overloaded functions when invoked with named arguments.
> (It probably applies to all functions, not just table functions.) The fix
> will take a while (if you wait for me to write it).
>
> For now please tell your users not to overload. :)
>
>
> Julian
>
>


-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when there's more than 1 function with the same name.

Yes; see below.

> FYI: I have a prototype of TableMacro working in Drill. For now just being able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to go.
> I'm still wondering about 3) is the table(...) wrapping syntax necessary?

Consider:

select * from myTable as f(x, y)
select * from myTable f(x, y)
select * from myFunction(x, y)

#1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f is a function with zero arguments, could you invoke it like this?:

select * from f

I don’t know the actual rationale. But I know that the SQL standards people in their wisdom decided to add a keyword to disambiguate.

> I had to fix some things in Calcite to enable this: https://github.com/dremio/calcite/pull/1/files <https://github.com/dremio/calcite/pull/1/files>
> Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite for the Maze example.
> Which is why some hooks were missing.

Can you log a jira case to track this bug?

> 
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166 <https://github.com/apache/calcite/pull/166>
> If we return more than 1 TableFunction with the same name, we get a NPE later on.

Yes, I knew there was a problem with overloading. Please log a JIRA case on resolution of overloaded functions when invoked with named arguments. (It probably applies to all functions, not just table functions.) The fix will take a while (if you wait for me to write it).

For now please tell your users not to overload. :)

Julian

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when there's more than 1 function with the same name.

Yes; see below.

> FYI: I have a prototype of TableMacro working in Drill. For now just being able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to go.
> I'm still wondering about 3) is the table(...) wrapping syntax necessary?

Consider:

select * from myTable as f(x, y)
select * from myTable f(x, y)
select * from myFunction(x, y)

#1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f is a function with zero arguments, could you invoke it like this?:

select * from f

I don’t know the actual rationale. But I know that the SQL standards people in their wisdom decided to add a keyword to disambiguate.

> I had to fix some things in Calcite to enable this: https://github.com/dremio/calcite/pull/1/files <https://github.com/dremio/calcite/pull/1/files>
> Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite for the Maze example.
> Which is why some hooks were missing.

Can you log a jira case to track this bug?

> 
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166 <https://github.com/apache/calcite/pull/166>
> If we return more than 1 TableFunction with the same name, we get a NPE later on.

Yes, I knew there was a problem with overloading. Please log a JIRA case on resolution of overloaded functions when invoked with named arguments. (It probably applies to all functions, not just table functions.) The fix will take a while (if you wait for me to write it).

For now please tell your users not to overload. :)

Julian

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

TL;DR: TableMacro works for me; I need help with a bug in Calcite when
there's more than 1 function with the same name.

FYI: I have a prototype of TableMacro working in Drill. For now just being
able to specify the delimiter for csv files.
So it seem the answer to my question 1) is that TableMacros are the way to
go.
I'm still wondering about *3) is the table(...) wrapping syntax necessary?*

I had to fix some things in Calcite to enable this:
https://github.com/dremio/calcite/pull/1/files
Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite
for the Maze example.
Which is why some hooks were missing.

I think I found a bug in Calcite but I'd need help to fix it.
Here is a test that reproduces the problem:
https://github.com/apache/calcite/pull/166
If we return more than 1 TableFunction with the same name, we get a NPE
later on.


On Wed, Nov 4, 2015 at 9:49 AM, Julien Le Dem <ju...@dremio.com> wrote:

> Looking in more details, The DrillTable already has toRel implemented, I
> just need to add "implements TranslatableTable"
> I'll try to implement TableMacro and see what happens.
>
> On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> FYI: here are the change to Calcite I did on the Drill fork:
>> https://github.com/mapr/incubator-calcite/pull/4/files
>> I'll port to the calcite master.
>>
>> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>>> Thanks Julian,
>>> I have looked into using Table Functions in Drill. I had to make some
>>> modifications to the planner so that the function lookup in the Storage
>>> plugin works. I will submit a patch for that.
>>>
>>> I had a few questions:
>>>  *1)* For this particular use case it seems that we could use
>>> TableMacro as all the logic can be happening in the planner. Should I look
>>> into that?
>>>    - Drill Schema returns a DrillTable (which implements Table).
>>>    - A TableMacro returns a TranslatableTable
>>>    - It is not clear to me what a TableFunction returns as it defines
>>> only methods that return types.
>>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>>> only difference with getTable is that we use the function parameters when
>>> producing a table.
>>> For reference: Drill getTable there:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>>> It indirectly calls:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>>
>>>  *2)* The getFunctions method in Schema does not seem to be aware at
>>> all of the context it is called in. I would want to return different
>>> functions depending on where we are in the query (table functions in the
>>> from clause, regular functions in where). Is there a way to know if we are
>>> in the context of a FROM or a WHERE clause?
>>>
>>>  *3)* is the table(...) wrapping syntax necessary?
>>> Note:
>>>   - In Drill back ticks are use for identifiers containing dot or slash.
>>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>>   - single quotes are used to delimit strings: 'my string passed as a
>>> parameter'
>>>
>>>   The current syntax is something like:
>>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>>> It seems that table(...) is redundant since we are in the from clause.
>>>  It could simply be:
>>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>>
>>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>>> parameter? (not the backticks instead of single quotes)
>>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>>> delimiter => '|')*
>>>
>>> Thank you!
>>> Julien
>>>
>>>
>>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>>
>>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>>> wrote:
>>>> > Agreed. We need both select with option and .drill (by etl process or
>>>> by
>>>> > sql ascribe metadata).
>>>> >
>>>> > Let's start with the select with options. My only goal would be to
>>>> make
>>>> > sure that creation of .drill file through SQL uses a similar pattern
>>>> to the
>>>> > select with options. It is also important that tables names are still
>>>> > expressed as identifiers instead of strings (people already have
>>>> enough
>>>> > trouble with remembering whether to use single quotes or backticks).
>>>> If the
>>>> > table function approach is everybody's preferred approach, I think it
>>>> is
>>>> > important to have named parameters per Julian's notes.
>>>> >
>>>> > @Julian, how hard do you think it will be to add named parameters?
>>>>
>>>> I just checked in a fix for
>>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>>
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

TL;DR: TableMacro works for me; I need help with a bug in Calcite when
there's more than 1 function with the same name.

FYI: I have a prototype of TableMacro working in Drill. For now just being
able to specify the delimiter for csv files.
So it seem the answer to my question 1) is that TableMacros are the way to
go.
I'm still wondering about *3) is the table(...) wrapping syntax necessary?*

I had to fix some things in Calcite to enable this:
https://github.com/dremio/calcite/pull/1/files
Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite
for the Maze example.
Which is why some hooks were missing.

I think I found a bug in Calcite but I'd need help to fix it.
Here is a test that reproduces the problem:
https://github.com/apache/calcite/pull/166
If we return more than 1 TableFunction with the same name, we get a NPE
later on.


On Wed, Nov 4, 2015 at 9:49 AM, Julien Le Dem <ju...@dremio.com> wrote:

> Looking in more details, The DrillTable already has toRel implemented, I
> just need to add "implements TranslatableTable"
> I'll try to implement TableMacro and see what happens.
>
> On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> FYI: here are the change to Calcite I did on the Drill fork:
>> https://github.com/mapr/incubator-calcite/pull/4/files
>> I'll port to the calcite master.
>>
>> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>>> Thanks Julian,
>>> I have looked into using Table Functions in Drill. I had to make some
>>> modifications to the planner so that the function lookup in the Storage
>>> plugin works. I will submit a patch for that.
>>>
>>> I had a few questions:
>>>  *1)* For this particular use case it seems that we could use
>>> TableMacro as all the logic can be happening in the planner. Should I look
>>> into that?
>>>    - Drill Schema returns a DrillTable (which implements Table).
>>>    - A TableMacro returns a TranslatableTable
>>>    - It is not clear to me what a TableFunction returns as it defines
>>> only methods that return types.
>>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>>> only difference with getTable is that we use the function parameters when
>>> producing a table.
>>> For reference: Drill getTable there:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>>> It indirectly calls:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>>
>>>  *2)* The getFunctions method in Schema does not seem to be aware at
>>> all of the context it is called in. I would want to return different
>>> functions depending on where we are in the query (table functions in the
>>> from clause, regular functions in where). Is there a way to know if we are
>>> in the context of a FROM or a WHERE clause?
>>>
>>>  *3)* is the table(...) wrapping syntax necessary?
>>> Note:
>>>   - In Drill back ticks are use for identifiers containing dot or slash.
>>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>>   - single quotes are used to delimit strings: 'my string passed as a
>>> parameter'
>>>
>>>   The current syntax is something like:
>>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>>> It seems that table(...) is redundant since we are in the from clause.
>>>  It could simply be:
>>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>>
>>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>>> parameter? (not the backticks instead of single quotes)
>>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>>> delimiter => '|')*
>>>
>>> Thank you!
>>> Julien
>>>
>>>
>>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>>
>>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>>> wrote:
>>>> > Agreed. We need both select with option and .drill (by etl process or
>>>> by
>>>> > sql ascribe metadata).
>>>> >
>>>> > Let's start with the select with options. My only goal would be to
>>>> make
>>>> > sure that creation of .drill file through SQL uses a similar pattern
>>>> to the
>>>> > select with options. It is also important that tables names are still
>>>> > expressed as identifiers instead of strings (people already have
>>>> enough
>>>> > trouble with remembering whether to use single quotes or backticks).
>>>> If the
>>>> > table function approach is everybody's preferred approach, I think it
>>>> is
>>>> > important to have named parameters per Julian's notes.
>>>> >
>>>> > @Julian, how hard do you think it will be to add named parameters?
>>>>
>>>> I just checked in a fix for
>>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>>
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Looking in more details, The DrillTable already has toRel implemented, I
just need to add "implements TranslatableTable"
I'll try to implement TableMacro and see what happens.

On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:

> FYI: here are the change to Calcite I did on the Drill fork:
> https://github.com/mapr/incubator-calcite/pull/4/files
> I'll port to the calcite master.
>
> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> Thanks Julian,
>> I have looked into using Table Functions in Drill. I had to make some
>> modifications to the planner so that the function lookup in the Storage
>> plugin works. I will submit a patch for that.
>>
>> I had a few questions:
>>  *1)* For this particular use case it seems that we could use TableMacro
>> as all the logic can be happening in the planner. Should I look into that?
>>    - Drill Schema returns a DrillTable (which implements Table).
>>    - A TableMacro returns a TranslatableTable
>>    - It is not clear to me what a TableFunction returns as it defines
>> only methods that return types.
>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>> only difference with getTable is that we use the function parameters when
>> producing a table.
>> For reference: Drill getTable there:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>> It indirectly calls:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>
>>  *2)* The getFunctions method in Schema does not seem to be aware at all
>> of the context it is called in. I would want to return different functions
>> depending on where we are in the query (table functions in the from clause,
>> regular functions in where). Is there a way to know if we are in the
>> context of a FROM or a WHERE clause?
>>
>>  *3)* is the table(...) wrapping syntax necessary?
>> Note:
>>   - In Drill back ticks are use for identifiers containing dot or slash.
>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>   - single quotes are used to delimit strings: 'my string passed as a
>> parameter'
>>
>>   The current syntax is something like:
>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>> It seems that table(...) is redundant since we are in the from clause.
>>  It could simply be:
>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>> => '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
>> '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>
>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>> parameter? (not the backticks instead of single quotes)
>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>> delimiter => '|')*
>>
>> Thank you!
>> Julien
>>
>>
>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>
>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>> wrote:
>>> > Agreed. We need both select with option and .drill (by etl process or
>>> by
>>> > sql ascribe metadata).
>>> >
>>> > Let's start with the select with options. My only goal would be to make
>>> > sure that creation of .drill file through SQL uses a similar pattern
>>> to the
>>> > select with options. It is also important that tables names are still
>>> > expressed as identifiers instead of strings (people already have enough
>>> > trouble with remembering whether to use single quotes or backticks).
>>> If the
>>> > table function approach is everybody's preferred approach, I think it
>>> is
>>> > important to have named parameters per Julian's notes.
>>> >
>>> > @Julian, how hard do you think it will be to add named parameters?
>>>
>>> I just checked in a fix for
>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Looking in more details, The DrillTable already has toRel implemented, I
just need to add "implements TranslatableTable"
I'll try to implement TableMacro and see what happens.

On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:

> FYI: here are the change to Calcite I did on the Drill fork:
> https://github.com/mapr/incubator-calcite/pull/4/files
> I'll port to the calcite master.
>
> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> Thanks Julian,
>> I have looked into using Table Functions in Drill. I had to make some
>> modifications to the planner so that the function lookup in the Storage
>> plugin works. I will submit a patch for that.
>>
>> I had a few questions:
>>  *1)* For this particular use case it seems that we could use TableMacro
>> as all the logic can be happening in the planner. Should I look into that?
>>    - Drill Schema returns a DrillTable (which implements Table).
>>    - A TableMacro returns a TranslatableTable
>>    - It is not clear to me what a TableFunction returns as it defines
>> only methods that return types.
>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>> only difference with getTable is that we use the function parameters when
>> producing a table.
>> For reference: Drill getTable there:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>> It indirectly calls:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>
>>  *2)* The getFunctions method in Schema does not seem to be aware at all
>> of the context it is called in. I would want to return different functions
>> depending on where we are in the query (table functions in the from clause,
>> regular functions in where). Is there a way to know if we are in the
>> context of a FROM or a WHERE clause?
>>
>>  *3)* is the table(...) wrapping syntax necessary?
>> Note:
>>   - In Drill back ticks are use for identifiers containing dot or slash.
>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>   - single quotes are used to delimit strings: 'my string passed as a
>> parameter'
>>
>>   The current syntax is something like:
>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>> It seems that table(...) is redundant since we are in the from clause.
>>  It could simply be:
>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>> => '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
>> '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>
>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>> parameter? (not the backticks instead of single quotes)
>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>> delimiter => '|')*
>>
>> Thank you!
>> Julien
>>
>>
>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>
>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>> wrote:
>>> > Agreed. We need both select with option and .drill (by etl process or
>>> by
>>> > sql ascribe metadata).
>>> >
>>> > Let's start with the select with options. My only goal would be to make
>>> > sure that creation of .drill file through SQL uses a similar pattern
>>> to the
>>> > select with options. It is also important that tables names are still
>>> > expressed as identifiers instead of strings (people already have enough
>>> > trouble with remembering whether to use single quotes or backticks).
>>> If the
>>> > table function approach is everybody's preferred approach, I think it
>>> is
>>> > important to have named parameters per Julian's notes.
>>> >
>>> > @Julian, how hard do you think it will be to add named parameters?
>>>
>>> I just checked in a fix for
>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

FYI: here are the change to Calcite I did on the Drill fork:
https://github.com/mapr/incubator-calcite/pull/4/files
I'll port to the calcite master.

On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Thanks Julian,
> I have looked into using Table Functions in Drill. I had to make some
> modifications to the planner so that the function lookup in the Storage
> plugin works. I will submit a patch for that.
>
> I had a few questions:
>  *1)* For this particular use case it seems that we could use TableMacro
> as all the logic can be happening in the planner. Should I look into that?
>    - Drill Schema returns a DrillTable (which implements Table).
>    - A TableMacro returns a TranslatableTable
>    - It is not clear to me what a TableFunction returns as it defines only
> methods that return types.
>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
> only difference with getTable is that we use the function parameters when
> producing a table.
> For reference: Drill getTable there:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
> It indirectly calls:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>
>  *2)* The getFunctions method in Schema does not seem to be aware at all
> of the context it is called in. I would want to return different functions
> depending on where we are in the query (table functions in the from clause,
> regular functions in where). Is there a way to know if we are in the
> context of a FROM or a WHERE clause?
>
>  *3)* is the table(...) wrapping syntax necessary?
> Note:
>   - In Drill back ticks are use for identifiers containing dot or slash.
> like the path to the file as a table name: dfs.`/path/to/file.ext`
>   - single quotes are used to delimit strings: 'my string passed as a
> parameter'
>
>   The current syntax is something like:
> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
> It seems that table(...) is redundant since we are in the from clause.
>  It could simply be:
> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
> => '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
> '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>
>  *4)* Can a table be a parameter? If yes, how do we declare a table
> parameter? (not the backticks instead of single quotes)
> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
> delimiter => '|')*
>
> Thank you!
> Julien
>
>
> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> > Agreed. We need both select with option and .drill (by etl process or by
>> > sql ascribe metadata).
>> >
>> > Let's start with the select with options. My only goal would be to make
>> > sure that creation of .drill file through SQL uses a similar pattern to
>> the
>> > select with options. It is also important that tables names are still
>> > expressed as identifiers instead of strings (people already have enough
>> > trouble with remembering whether to use single quotes or backticks). If
>> the
>> > table function approach is everybody's preferred approach, I think it is
>> > important to have named parameters per Julian's notes.
>> >
>> > @Julian, how hard do you think it will be to add named parameters?
>>
>> I just checked in a fix for
>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

FYI: here are the change to Calcite I did on the Drill fork:
https://github.com/mapr/incubator-calcite/pull/4/files
I'll port to the calcite master.

On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Thanks Julian,
> I have looked into using Table Functions in Drill. I had to make some
> modifications to the planner so that the function lookup in the Storage
> plugin works. I will submit a patch for that.
>
> I had a few questions:
>  *1)* For this particular use case it seems that we could use TableMacro
> as all the logic can be happening in the planner. Should I look into that?
>    - Drill Schema returns a DrillTable (which implements Table).
>    - A TableMacro returns a TranslatableTable
>    - It is not clear to me what a TableFunction returns as it defines only
> methods that return types.
>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
> only difference with getTable is that we use the function parameters when
> producing a table.
> For reference: Drill getTable there:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
> It indirectly calls:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>
>  *2)* The getFunctions method in Schema does not seem to be aware at all
> of the context it is called in. I would want to return different functions
> depending on where we are in the query (table functions in the from clause,
> regular functions in where). Is there a way to know if we are in the
> context of a FROM or a WHERE clause?
>
>  *3)* is the table(...) wrapping syntax necessary?
> Note:
>   - In Drill back ticks are use for identifiers containing dot or slash.
> like the path to the file as a table name: dfs.`/path/to/file.ext`
>   - single quotes are used to delimit strings: 'my string passed as a
> parameter'
>
>   The current syntax is something like:
> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
> It seems that table(...) is redundant since we are in the from clause.
>  It could simply be:
> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
> => '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
> '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>
>  *4)* Can a table be a parameter? If yes, how do we declare a table
> parameter? (not the backticks instead of single quotes)
> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
> delimiter => '|')*
>
> Thank you!
> Julien
>
>
> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> > Agreed. We need both select with option and .drill (by etl process or by
>> > sql ascribe metadata).
>> >
>> > Let's start with the select with options. My only goal would be to make
>> > sure that creation of .drill file through SQL uses a similar pattern to
>> the
>> > select with options. It is also important that tables names are still
>> > expressed as identifiers instead of strings (people already have enough
>> > trouble with remembering whether to use single quotes or backticks). If
>> the
>> > table function approach is everybody's preferred approach, I think it is
>> > important to have named parameters per Julian's notes.
>> >
>> > @Julian, how hard do you think it will be to add named parameters?
>>
>> I just checked in a fix for
>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>
>
>
>
> --
> Julien
>



-- 
Julien