You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Julien Le Dem <ju...@dremio.com> on 2015/10/19 23:28:52 UTC

select from table with options

I'm looking into passing information on how to interpret a file through the
select clause in Drill.
Something along the lines of:
*select * from
dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
(In this example, we want to specify a specific delimiter, but that would
apply to any *type* of format)

Which would allow to read a file without having to centrally configure
formats: https://drill.apache.org/docs/querying-plain-text-files/
Which makes it easier to try to read an existing file.
Typically once the user has found the proper settings, they would update
the central configuration.

thoughts?

-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks! Let me know if you need help.

> On Nov 13, 2015, at 9:29 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Here you go: https://issues.apache.org/jira/browse/CALCITE-967
> I was planning on providing patch for both master and the fork, but I
> haven't started yet.
> 
> On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> You’re hitting the grammar ambiguity I expected.
>> 
>> I think that base Calcite should require the full verbose syntax: the
>> TABLE keyword for table functions and the EXTEND keyword for extends
>> clauses. Then Drill can override to make TABLE optional, and Phoenix can
>> override to make EXTEND optional.
>> 
>> Are you changing the parser in your forked copy of Calcite, or are you
>> changing Drill’s extensions to that parser?
>> 
>> If the former, you (or I) should add extension points to Calcite’s parser
>> make the TABLE keyword optional and to make the EXTEND keyword optional. No
>> project should enable both extension points — otherwise they’ll end up with
>> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>> 
>> Julian
>> 
>> 
>>> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> 
>>> Hi,
>>> I've been trying to enable this but it looks like in the current grammar
>>> (before my change) you can not use table functions and EXTEND together.
>>> That's because they are on difference branches of an | in the grammar.
>>> So I would suggest that we treat those as two separate improvement in two
>>> different pull requests:
>>> - not require table(...) to call table functions
>>> - allow using table functions and extend together.
>>> Does it make sense?
>>> Julien
>>> 
>>> 
>>> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> To be clear, it should be possible to use a table function with all of
>>>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>>>> aliases, TABLESAMPLE.
>>>> 
>>>> I'm surprised that the parser didn't need more lookahead to choose
>>>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>>>> 
>>>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> In the patch I just sent, probably not.
>>>>> I will adjust it and add the corresponding test.
>>>>> 
>>>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>> 
>>>>>> Can you use both together? Say
>>>>>> 
>>>>>> select columns
>>>>>> from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>>>> EXTEND
>>>>>> (foo INTEGER)
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>>>> wrote:
>>>>>>> 
>>>>>>> I took a stab at adding the TableFunction syntax without table(...)
>> in
>>>>>>> Calcite.
>>>>>>> I have verified that both the table function and extend (with or
>>>> without
>>>>>>> keyword) work
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>>>> 
>>>>>>> These work:
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>>>> fieldDelimiter
>>>>>> =>
>>>>>>> '|')
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>>>> fieldDelimiter => '|'))
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>>>> we
>>>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> That being said, I think we could make these work together... maybe.
>>>>>>>> 
>>>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>>>> 
>>>>>>>> Create a new sub block in the table block that requires no keyword.
>>>>>> There
>>>>>>>> would be two paths (and would probably require some lookahead)
>>>>>>>> 
>>>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>>>> golfHandicap INTEGER)
>>>>>>>> 
>>>>>>>> Then we create a table function with options 1 & 2, an EXTENDS
>> clause
>>>>>> for
>>>>>>>> option 3.
>>>>>>>> 
>>>>>>>> Best of both worlds?
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
>> jamestaylor@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Phoenix already supports columns at read-time using the syntax
>>>> without
>>>>>>>> the
>>>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>>> SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>> 
>>>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>>>> backward
>>>>>>>>> compatibility problem.
>>>>>>>>> 
>>>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>>>> Phoenix
>>>>>>>>> given our Drillix initiative.
>>>>>>>>> 
>>>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>>>> grammar,
>>>>>>>>>> not something later.
>>>>>>>>>> 
>>>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>>>> only
>>>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>>>> should
>>>>>>>>> be
>>>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>>>> (the
>>>>>>>>> same
>>>>>>>>>> way we plug in external things in Drill).
>>>>>>>>>> 
>>>>>>>>>> Trying to get every project to agree on extensions seems like it
>>>> may
>>>>>> be
>>>>>>>>>> hard.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Jacques Nadeau
>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>> 
>>>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>>>> 
>>>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
>> be
>>>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>>>> later.
>>>>>>>>>>> 
>>>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>>>> avoiding
>>>>>> a
>>>>>>>>>>> tough design discussion.
>>>>>>>>>>> 
>>>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
>> and
>>>>>>>> they
>>>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>>>> Phoenix
>>>>>>>> (by
>>>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>>>> SQL
>>>>>>>>> syntax
>>>>>>>>>>> should be.
>>>>>>>>>>> 
>>>>>>>>>>> Julian
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>>>> appears
>>>>>>>>>>> more
>>>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>>>> jacques@dremio.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>>>> we
>>>>>>>>> could
>>>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>>>> seems
>>>>>>>>>>>>> reasonable to support the table functions without the TABLE
>>>> syntax.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>>>> confusing
>>>>>>>> to
>>>>>>>>>>> use,
>>>>>>>>>>>>> especially in the example that we're looking to support, such
>>>> as:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>> fieldDelimiter
>>>>>>>>> =>
>>>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>>>> fieldDelimiter
>>>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>>>> goal).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for doing the legwork and finding what the other
>> vendors
>>>>>>>> do.
>>>>>>>>>>> It is
>>>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>>>> standard
>>>>>>>>>>> an
>>>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>>>> clash
>>>>>>>>>>> with
>>>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
>> the
>>>>>>>>>>> EXTENDS
>>>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>>>> Emp
>>>>>>>>> table
>>>>>>>>>>>>> but
>>>>>>>>>>>>>> you would like to use them in this particular query. We chose
>>>> to
>>>>>>>>> make
>>>>>>>>>>> the
>>>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> so we would require
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>>>> forms
>>>>>>>>>>> like
>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>>>> should
>>>>>>>>>>>>> also
>>>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>>>> there's
>>>>>>>>> no
>>>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>>>> calling
>>>>>>>>> table
>>>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>>>> DB2
>>>>>>>>> [4]
>>>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>>>> generation
>>>>>>>>> to
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>>>> MySQL
>>>>>>>>> like
>>>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> 
>>>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>>>> [3]
>>>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>>>> [4]
>>>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>>>> function
>>>>>>>>>>>>>>> overloading:
>> https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>>>> called
>>>>>>>>>>> more
>>>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>>>> want
>>>>>>>>> to
>>>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
>> jhyde@apache.org
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
>> julien@dremio.com
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>>>> Calcite
>>>>>>>>>>> when
>>>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>>>> now
>>>>>>>>> just
>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>>>> are
>>>>>>>>> the
>>>>>>>>>>>>> way
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>>>> syntax
>>>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>>>> similar.
>>>>>>>>> Also,
>>>>>>>>>>>>> if
>>>>>>>>>>>>>> f
>>>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>>>> this?:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>>>> standards
>>>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>>>> disambiguate.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>>>> used
>>>>>>>>> in
>>>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
>> it.
>>>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
>> we
>>>>>>>> get
>>>>>>>>> a
>>>>>>>>>>>>> NPE
>>>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>>>> a
>>>>>>>>> JIRA
>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
>> named
>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>>>> functions.)
>>>>>>>>>>> The
>>>>>>>>>>>>>> fix
>>>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> *Jim Scott*
>>>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>>>> 
>>>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>>>> 
>>>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Julien
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks! Let me know if you need help.

> On Nov 13, 2015, at 9:29 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Here you go: https://issues.apache.org/jira/browse/CALCITE-967
> I was planning on providing patch for both master and the fork, but I
> haven't started yet.
> 
> On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> You’re hitting the grammar ambiguity I expected.
>> 
>> I think that base Calcite should require the full verbose syntax: the
>> TABLE keyword for table functions and the EXTEND keyword for extends
>> clauses. Then Drill can override to make TABLE optional, and Phoenix can
>> override to make EXTEND optional.
>> 
>> Are you changing the parser in your forked copy of Calcite, or are you
>> changing Drill’s extensions to that parser?
>> 
>> If the former, you (or I) should add extension points to Calcite’s parser
>> make the TABLE keyword optional and to make the EXTEND keyword optional. No
>> project should enable both extension points — otherwise they’ll end up with
>> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>> 
>> Julian
>> 
>> 
>>> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> 
>>> Hi,
>>> I've been trying to enable this but it looks like in the current grammar
>>> (before my change) you can not use table functions and EXTEND together.
>>> That's because they are on difference branches of an | in the grammar.
>>> So I would suggest that we treat those as two separate improvement in two
>>> different pull requests:
>>> - not require table(...) to call table functions
>>> - allow using table functions and extend together.
>>> Does it make sense?
>>> Julien
>>> 
>>> 
>>> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> To be clear, it should be possible to use a table function with all of
>>>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>>>> aliases, TABLESAMPLE.
>>>> 
>>>> I'm surprised that the parser didn't need more lookahead to choose
>>>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>>>> 
>>>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> In the patch I just sent, probably not.
>>>>> I will adjust it and add the corresponding test.
>>>>> 
>>>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>> 
>>>>>> Can you use both together? Say
>>>>>> 
>>>>>> select columns
>>>>>> from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>>>> EXTEND
>>>>>> (foo INTEGER)
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>>>> wrote:
>>>>>>> 
>>>>>>> I took a stab at adding the TableFunction syntax without table(...)
>> in
>>>>>>> Calcite.
>>>>>>> I have verified that both the table function and extend (with or
>>>> without
>>>>>>> keyword) work
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>>>> 
>>>>>>> These work:
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>>>> fieldDelimiter
>>>>>> =>
>>>>>>> '|')
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>>>> fieldDelimiter => '|'))
>>>>>>> 
>>>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>>>> 
>>>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>>>> we
>>>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> That being said, I think we could make these work together... maybe.
>>>>>>>> 
>>>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>>>> 
>>>>>>>> Create a new sub block in the table block that requires no keyword.
>>>>>> There
>>>>>>>> would be two paths (and would probably require some lookahead)
>>>>>>>> 
>>>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>>>> golfHandicap INTEGER)
>>>>>>>> 
>>>>>>>> Then we create a table function with options 1 & 2, an EXTENDS
>> clause
>>>>>> for
>>>>>>>> option 3.
>>>>>>>> 
>>>>>>>> Best of both worlds?
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
>> jamestaylor@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Phoenix already supports columns at read-time using the syntax
>>>> without
>>>>>>>> the
>>>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>>> SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>> 
>>>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>>>> backward
>>>>>>>>> compatibility problem.
>>>>>>>>> 
>>>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>>>> Phoenix
>>>>>>>>> given our Drillix initiative.
>>>>>>>>> 
>>>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>>>> grammar,
>>>>>>>>>> not something later.
>>>>>>>>>> 
>>>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>>>> only
>>>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>>>> should
>>>>>>>>> be
>>>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>>>> (the
>>>>>>>>> same
>>>>>>>>>> way we plug in external things in Drill).
>>>>>>>>>> 
>>>>>>>>>> Trying to get every project to agree on extensions seems like it
>>>> may
>>>>>> be
>>>>>>>>>> hard.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Jacques Nadeau
>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>> 
>>>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>>>> 
>>>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
>> be
>>>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>>>> later.
>>>>>>>>>>> 
>>>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>>>> avoiding
>>>>>> a
>>>>>>>>>>> tough design discussion.
>>>>>>>>>>> 
>>>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
>> and
>>>>>>>> they
>>>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>>>> Phoenix
>>>>>>>> (by
>>>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>>>> SQL
>>>>>>>>> syntax
>>>>>>>>>>> should be.
>>>>>>>>>>> 
>>>>>>>>>>> Julian
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>>>> appears
>>>>>>>>>>> more
>>>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>>>> jacques@dremio.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>>>> we
>>>>>>>>> could
>>>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>>>> seems
>>>>>>>>>>>>> reasonable to support the table functions without the TABLE
>>>> syntax.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>>>> confusing
>>>>>>>> to
>>>>>>>>>>> use,
>>>>>>>>>>>>> especially in the example that we're looking to support, such
>>>> as:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>> fieldDelimiter
>>>>>>>>> =>
>>>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>>>> fieldDelimiter
>>>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>>>> goal).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for doing the legwork and finding what the other
>> vendors
>>>>>>>> do.
>>>>>>>>>>> It is
>>>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>>>> standard
>>>>>>>>>>> an
>>>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>>>> clash
>>>>>>>>>>> with
>>>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
>> the
>>>>>>>>>>> EXTENDS
>>>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>>>> INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>>>> Emp
>>>>>>>>> table
>>>>>>>>>>>>> but
>>>>>>>>>>>>>> you would like to use them in this particular query. We chose
>>>> to
>>>>>>>>> make
>>>>>>>>>>> the
>>>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> so we would require
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>>>> forms
>>>>>>>>>>> like
>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> SELECT *
>>>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>>>> should
>>>>>>>>>>>>> also
>>>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>>>> there's
>>>>>>>>> no
>>>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>>>> calling
>>>>>>>>> table
>>>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>>>> DB2
>>>>>>>>> [4]
>>>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>>>> generation
>>>>>>>>> to
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>>>> MySQL
>>>>>>>>> like
>>>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> 
>>>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>>>> [3]
>>>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>>>> [4]
>>>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>>>> function
>>>>>>>>>>>>>>> overloading:
>> https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>>>> called
>>>>>>>>>>> more
>>>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>>>> want
>>>>>>>>> to
>>>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
>> jhyde@apache.org
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
>> julien@dremio.com
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>>>> Calcite
>>>>>>>>>>> when
>>>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>>>> now
>>>>>>>>> just
>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>>>> are
>>>>>>>>> the
>>>>>>>>>>>>> way
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>>>> syntax
>>>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>>>> similar.
>>>>>>>>> Also,
>>>>>>>>>>>>> if
>>>>>>>>>>>>>> f
>>>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>>>> this?:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>>>> standards
>>>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>>>> disambiguate.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>>>> used
>>>>>>>>> in
>>>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
>> it.
>>>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
>> we
>>>>>>>> get
>>>>>>>>> a
>>>>>>>>>>>>> NPE
>>>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>>>> a
>>>>>>>>> JIRA
>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
>> named
>>>>>>>>>>>>> arguments.
>>>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>>>> functions.)
>>>>>>>>>>> The
>>>>>>>>>>>>>> fix
>>>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Julien
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> *Jim Scott*
>>>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>>>> 
>>>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>>>> 
>>>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Julien
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Here you go: https://issues.apache.org/jira/browse/CALCITE-967
I was planning on providing patch for both master and the fork, but I
haven't started yet.

On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:

> You’re hitting the grammar ambiguity I expected.
>
> I think that base Calcite should require the full verbose syntax: the
> TABLE keyword for table functions and the EXTEND keyword for extends
> clauses. Then Drill can override to make TABLE optional, and Phoenix can
> override to make EXTEND optional.
>
> Are you changing the parser in your forked copy of Calcite, or are you
> changing Drill’s extensions to that parser?
>
> If the former, you (or I) should add extension points to Calcite’s parser
> make the TABLE keyword optional and to make the EXTEND keyword optional. No
> project should enable both extension points — otherwise they’ll end up with
> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>
> Julian
>
>
> > On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > Hi,
> > I've been trying to enable this but it looks like in the current grammar
> > (before my change) you can not use table functions and EXTEND together.
> > That's because they are on difference branches of an | in the grammar.
> > So I would suggest that we treat those as two separate improvement in two
> > different pull requests:
> > - not require table(...) to call table functions
> > - allow using table functions and extend together.
> > Does it make sense?
> > Julien
> >
> >
> > On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> To be clear, it should be possible to use a table function with all of
> >> the options -- EXTENDS clause, OVER clause, AS with alias and column
> >> aliases, TABLESAMPLE.
> >>
> >> I'm surprised that the parser didn't need more lookahead to choose
> >> between 't (x, y)' and 't (x INTEGER, y DATE)'.
> >>
> >> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >>> In the patch I just sent, probably not.
> >>> I will adjust it and add the corresponding test.
> >>>
> >>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>
> >>>> Can you use both together? Say
> >>>>
> >>>>  select columns
> >>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
> >> EXTEND
> >>>> (foo INTEGER)
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >>>>>
> >>>>> I took a stab at adding the TableFunction syntax without table(...)
> in
> >>>>> Calcite.
> >>>>> I have verified that both the table function and extend (with or
> >> without
> >>>>> keyword) work
> >>>>>
> >>>>
> >>
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >>>>>
> >>>>> These work:
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
> >> fieldDelimiter
> >>>> =>
> >>>>> '|')
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> >>>>> fieldDelimiter => '|'))
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`('JSON')
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >>>>>
> >>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
> >> we
> >>>>>> should constrain ourselves to having a union of the two syntaxes.
> >>>>>>
> >>>>>>
> >>>>>> That being said, I think we could make these work together... maybe.
> >>>>>>
> >>>>>> Remove the EXTENDS without keyword syntax from the grammar.
> >>>>>>
> >>>>>> Create a new sub block in the table block that requires no keyword.
> >>>> There
> >>>>>> would be two paths (and would probably require some lookahead)
> >>>>>>
> >>>>>> option 1> unnamed parameters (1,2,3)
> >>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
> >>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >>>>>> golfHandicap INTEGER)
> >>>>>>
> >>>>>> Then we create a table function with options 1 & 2, an EXTENDS
> clause
> >>>> for
> >>>>>> option 3.
> >>>>>>
> >>>>>> Best of both worlds?
> >>>>>>
> >>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
> jamestaylor@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Phoenix already supports columns at read-time using the syntax
> >> without
> >>>>>> the
> >>>>>>> EXTENDS keyword as Julian indicated:
> >>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>  WHERE goldHandicap < 10;
> >>>>>>>
> >>>>>>> Changing this by requiring the EXTENDS keyword would create a
> >> backward
> >>>>>>> compatibility problem.
> >>>>>>>
> >>>>>>> I think it'd be good if both of these extensions worked in Drill &
> >>>>>> Phoenix
> >>>>>>> given our Drillix initiative.
> >>>>>>>
> >>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> My proposal was an a or b using the freemarker template in the
> >>>> grammar,
> >>>>>>>> not something later.
> >>>>>>>>
> >>>>>>>> Actually, put another way: we may want to consider stating that we
> >>>> only
> >>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
> >>>> should
> >>>>>>> be
> >>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
> >> (the
> >>>>>>> same
> >>>>>>>> way we plug in external things in Drill).
> >>>>>>>>
> >>>>>>>> Trying to get every project to agree on extensions seems like it
> >> may
> >>>> be
> >>>>>>>> hard.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Jacques Nadeau
> >>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>
> >>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> I can see why Jacques wants this syntax.
> >>>>>>>>>
> >>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
> be
> >>>>>>>>> predictable. Any variation should happen at validation time, or
> >>>> later.
> >>>>>>>>>
> >>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
> >> avoiding
> >>>> a
> >>>>>>>>> tough design discussion.
> >>>>>>>>>
> >>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
> and
> >>>>>> they
> >>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
> >> Phoenix
> >>>>>> (by
> >>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
> >> SQL
> >>>>>>> syntax
> >>>>>>>>> should be.
> >>>>>>>>>
> >>>>>>>>> Julian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Looking at those two examples I agree with Jacques. The first
> >>>>>> appears
> >>>>>>>>> more
> >>>>>>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
> >> jacques@dremio.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
> >> we
> >>>>>>> could
> >>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
> >> seems
> >>>>>>>>>>> reasonable to support the table functions without the TABLE
> >> syntax.
> >>>>>>>>>>>
> >>>>>>>>>>> I for one definitely think the TABLE syntax is much more
> >> confusing
> >>>>>> to
> >>>>>>>>> use,
> >>>>>>>>>>> especially in the example that we're looking to support, such
> >> as:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>> fieldDelimiter
> >>>>>>> =>
> >>>>>>>>>>> '|', skipFirstRow => true)
> >>>>>>>>>>>
> >>>>>>>>>>> This seems much clearer than:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>>>>> fieldDelimiter
> >>>>>>>>>>> => '|', skipFirstRow => true))
> >>>>>>>>>>>
> >>>>>>>>>>> It also looks much more like a hint to the table (which is our
> >>>>>> goal).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for doing the legwork and finding what the other
> vendors
> >>>>>> do.
> >>>>>>>>> It is
> >>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>>>>>> standard
> >>>>>>>>> an
> >>>>>>>>>>>> make the TABLE keyword optional.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >>>>>> clash
> >>>>>>>>> with
> >>>>>>>>>>>> one of our own (few) extensions. In
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
> the
> >>>>>>>>> EXTENDS
> >>>>>>>>>>>> clause. You can write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
> >> Emp
> >>>>>>> table
> >>>>>>>>>>> but
> >>>>>>>>>>>> you would like to use them in this particular query. We chose
> >> to
> >>>>>>> make
> >>>>>>>>> the
> >>>>>>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> That is uncomfortably close to
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>>>>>
> >>>>>>>>>>>> so we would require
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>>>>>
> >>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
> >>>>>> forms
> >>>>>>>>> like
> >>>>>>>>>>>> this:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>>>>>
> >>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>>>>>> should
> >>>>>>>>>>> also
> >>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
> >>>>>> there's
> >>>>>>> no
> >>>>>>>>>>>>> consensus about this.
> >>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
> >> calling
> >>>>>>> table
> >>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
> >> DB2
> >>>>>>> [4]
> >>>>>>>>>>>>> expect it.
> >>>>>>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >>>>>> generation
> >>>>>>> to
> >>>>>>>>>>>> have
> >>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
> >> MySQL
> >>>>>>> like
> >>>>>>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>>>>>> [2]
> >>>>>>>>>>>
> >>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>>>>>> [3]
> >>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>>>>>> [4]
> >>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>>>>>> [5]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
> >> function
> >>>>>>>>>>>>> overloading:
> https://github.com/apache/calcite/pull/166/files
> >>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
> >>>>>> called
> >>>>>>>>> more
> >>>>>>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
> >> want
> >>>>>>> to
> >>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
> jhyde@apache.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
> julien@dremio.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >>>>>> Calcite
> >>>>>>>>> when
> >>>>>>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes; see below.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
> >> now
> >>>>>>> just
> >>>>>>>>>>>> being
> >>>>>>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
> >> are
> >>>>>>> the
> >>>>>>>>>>> way
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> go.
> >>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
> >> syntax
> >>>>>>>>>>>>>> necessary?*
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Consider:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
> >> similar.
> >>>>>>> Also,
> >>>>>>>>>>> if
> >>>>>>>>>>>> f
> >>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
> >>>>>> this?:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from f
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>>>>>> standards
> >>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
> >> disambiguate.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
> >> used
> >>>>>>> in
> >>>>>>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
> it.
> >>>>>>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
> we
> >>>>>> get
> >>>>>>> a
> >>>>>>>>>>> NPE
> >>>>>>>>>>>>>> later on.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
> >> a
> >>>>>>> JIRA
> >>>>>>>>>>> case
> >>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
> named
> >>>>>>>>>>> arguments.
> >>>>>>>>>>>>>> (It probably applies to all functions, not just table
> >>>>>> functions.)
> >>>>>>>>> The
> >>>>>>>>>>>> fix
> >>>>>>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> *Jim Scott*
> >>>>>>>>>> Director, Enterprise Strategy & Architecture
> >>>>>>>>>> +1 (347) 746-9281
> >>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>>>>>
> >>>>>>>>>> <http://www.mapr.com/>
> >>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>>>>>
> >>>>>>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Julien
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Julien
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Here you go: https://issues.apache.org/jira/browse/CALCITE-967
I was planning on providing patch for both master and the fork, but I
haven't started yet.

On Thu, Nov 12, 2015 at 8:34 PM, Julian Hyde <jh...@apache.org> wrote:

> You’re hitting the grammar ambiguity I expected.
>
> I think that base Calcite should require the full verbose syntax: the
> TABLE keyword for table functions and the EXTEND keyword for extends
> clauses. Then Drill can override to make TABLE optional, and Phoenix can
> override to make EXTEND optional.
>
> Are you changing the parser in your forked copy of Calcite, or are you
> changing Drill’s extensions to that parser?
>
> If the former, you (or I) should add extension points to Calcite’s parser
> make the TABLE keyword optional and to make the EXTEND keyword optional. No
> project should enable both extension points — otherwise they’ll end up with
> an ambiguous grammar. If you agree create a Calcite JIRA case for this.
>
> Julian
>
>
> > On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > Hi,
> > I've been trying to enable this but it looks like in the current grammar
> > (before my change) you can not use table functions and EXTEND together.
> > That's because they are on difference branches of an | in the grammar.
> > So I would suggest that we treat those as two separate improvement in two
> > different pull requests:
> > - not require table(...) to call table functions
> > - allow using table functions and extend together.
> > Does it make sense?
> > Julien
> >
> >
> > On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> To be clear, it should be possible to use a table function with all of
> >> the options -- EXTENDS clause, OVER clause, AS with alias and column
> >> aliases, TABLESAMPLE.
> >>
> >> I'm surprised that the parser didn't need more lookahead to choose
> >> between 't (x, y)' and 't (x INTEGER, y DATE)'.
> >>
> >> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >>> In the patch I just sent, probably not.
> >>> I will adjust it and add the corresponding test.
> >>>
> >>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>
> >>>> Can you use both together? Say
> >>>>
> >>>>  select columns
> >>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
> >> EXTEND
> >>>> (foo INTEGER)
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >>>>>
> >>>>> I took a stab at adding the TableFunction syntax without table(...)
> in
> >>>>> Calcite.
> >>>>> I have verified that both the table function and extend (with or
> >> without
> >>>>> keyword) work
> >>>>>
> >>>>
> >>
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >>>>>
> >>>>> These work:
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
> >> fieldDelimiter
> >>>> =>
> >>>>> '|')
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> >>>>> fieldDelimiter => '|'))
> >>>>>
> >>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`('JSON')
> >>>>>
> >>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >>>>>
> >>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
> >> we
> >>>>>> should constrain ourselves to having a union of the two syntaxes.
> >>>>>>
> >>>>>>
> >>>>>> That being said, I think we could make these work together... maybe.
> >>>>>>
> >>>>>> Remove the EXTENDS without keyword syntax from the grammar.
> >>>>>>
> >>>>>> Create a new sub block in the table block that requires no keyword.
> >>>> There
> >>>>>> would be two paths (and would probably require some lookahead)
> >>>>>>
> >>>>>> option 1> unnamed parameters (1,2,3)
> >>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
> >>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >>>>>> golfHandicap INTEGER)
> >>>>>>
> >>>>>> Then we create a table function with options 1 & 2, an EXTENDS
> clause
> >>>> for
> >>>>>> option 3.
> >>>>>>
> >>>>>> Best of both worlds?
> >>>>>>
> >>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <
> jamestaylor@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Phoenix already supports columns at read-time using the syntax
> >> without
> >>>>>> the
> >>>>>>> EXTENDS keyword as Julian indicated:
> >>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>  WHERE goldHandicap < 10;
> >>>>>>>
> >>>>>>> Changing this by requiring the EXTENDS keyword would create a
> >> backward
> >>>>>>> compatibility problem.
> >>>>>>>
> >>>>>>> I think it'd be good if both of these extensions worked in Drill &
> >>>>>> Phoenix
> >>>>>>> given our Drillix initiative.
> >>>>>>>
> >>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <jacques@dremio.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> My proposal was an a or b using the freemarker template in the
> >>>> grammar,
> >>>>>>>> not something later.
> >>>>>>>>
> >>>>>>>> Actually, put another way: we may want to consider stating that we
> >>>> only
> >>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
> >>>> should
> >>>>>>> be
> >>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
> >> (the
> >>>>>>> same
> >>>>>>>> way we plug in external things in Drill).
> >>>>>>>>
> >>>>>>>> Trying to get every project to agree on extensions seems like it
> >> may
> >>>> be
> >>>>>>>> hard.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Jacques Nadeau
> >>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>
> >>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> I can see why Jacques wants this syntax.
> >>>>>>>>>
> >>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to
> be
> >>>>>>>>> predictable. Any variation should happen at validation time, or
> >>>> later.
> >>>>>>>>>
> >>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
> >> avoiding
> >>>> a
> >>>>>>>>> tough design discussion.
> >>>>>>>>>
> >>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL,
> and
> >>>>>> they
> >>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
> >> Phoenix
> >>>>>> (by
> >>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
> >> SQL
> >>>>>>> syntax
> >>>>>>>>> should be.
> >>>>>>>>>
> >>>>>>>>> Julian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Looking at those two examples I agree with Jacques. The first
> >>>>>> appears
> >>>>>>>>> more
> >>>>>>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
> >> jacques@dremio.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
> >> we
> >>>>>>> could
> >>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
> >> seems
> >>>>>>>>>>> reasonable to support the table functions without the TABLE
> >> syntax.
> >>>>>>>>>>>
> >>>>>>>>>>> I for one definitely think the TABLE syntax is much more
> >> confusing
> >>>>>> to
> >>>>>>>>> use,
> >>>>>>>>>>> especially in the example that we're looking to support, such
> >> as:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>> fieldDelimiter
> >>>>>>> =>
> >>>>>>>>>>> '|', skipFirstRow => true)
> >>>>>>>>>>>
> >>>>>>>>>>> This seems much clearer than:
> >>>>>>>>>>>
> >>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>>>>>> fieldDelimiter
> >>>>>>>>>>> => '|', skipFirstRow => true))
> >>>>>>>>>>>
> >>>>>>>>>>> It also looks much more like a hint to the table (which is our
> >>>>>> goal).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for doing the legwork and finding what the other
> vendors
> >>>>>> do.
> >>>>>>>>> It is
> >>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>>>>>> standard
> >>>>>>>>> an
> >>>>>>>>>>>> make the TABLE keyword optional.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >>>>>> clash
> >>>>>>>>> with
> >>>>>>>>>>>> one of our own (few) extensions. In
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added
> the
> >>>>>>>>> EXTENDS
> >>>>>>>>>>>> clause. You can write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
> >> INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
> >> Emp
> >>>>>>> table
> >>>>>>>>>>> but
> >>>>>>>>>>>> you would like to use them in this particular query. We chose
> >> to
> >>>>>>> make
> >>>>>>>>> the
> >>>>>>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> That is uncomfortably close to
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>>>>>
> >>>>>>>>>>>> so we would require
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>>>>>
> >>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
> >>>>>> forms
> >>>>>>>>> like
> >>>>>>>>>>>> this:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SELECT *
> >>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>>>>>
> >>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>>>>>> should
> >>>>>>>>>>> also
> >>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <julien@dremio.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
> >>>>>> there's
> >>>>>>> no
> >>>>>>>>>>>>> consensus about this.
> >>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
> >> calling
> >>>>>>> table
> >>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
> >> DB2
> >>>>>>> [4]
> >>>>>>>>>>>>> expect it.
> >>>>>>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >>>>>> generation
> >>>>>>> to
> >>>>>>>>>>>> have
> >>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
> >> MySQL
> >>>>>>> like
> >>>>>>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>>>>>> [2]
> >>>>>>>>>>>
> >>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>>>>>> [3]
> >>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>>>>>> [4]
> >>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>>>>>> [5]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
> >> function
> >>>>>>>>>>>>> overloading:
> https://github.com/apache/calcite/pull/166/files
> >>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
> >>>>>> called
> >>>>>>>>> more
> >>>>>>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
> >> want
> >>>>>>> to
> >>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <
> jhyde@apache.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <
> julien@dremio.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >>>>>> Calcite
> >>>>>>>>> when
> >>>>>>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes; see below.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
> >> now
> >>>>>>> just
> >>>>>>>>>>>> being
> >>>>>>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
> >> are
> >>>>>>> the
> >>>>>>>>>>> way
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> go.
> >>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
> >> syntax
> >>>>>>>>>>>>>> necessary?*
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Consider:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
> >> similar.
> >>>>>>> Also,
> >>>>>>>>>>> if
> >>>>>>>>>>>> f
> >>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
> >>>>>> this?:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> select * from f
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>>>>>> standards
> >>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
> >> disambiguate.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
> >> used
> >>>>>>> in
> >>>>>>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix
> it.
> >>>>>>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name,
> we
> >>>>>> get
> >>>>>>> a
> >>>>>>>>>>> NPE
> >>>>>>>>>>>>>> later on.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
> >> a
> >>>>>>> JIRA
> >>>>>>>>>>> case
> >>>>>>>>>>>>>> on resolution of overloaded functions when invoked with
> named
> >>>>>>>>>>> arguments.
> >>>>>>>>>>>>>> (It probably applies to all functions, not just table
> >>>>>> functions.)
> >>>>>>>>> The
> >>>>>>>>>>>> fix
> >>>>>>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Julien
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> *Jim Scott*
> >>>>>>>>>> Director, Enterprise Strategy & Architecture
> >>>>>>>>>> +1 (347) 746-9281
> >>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>>>>>
> >>>>>>>>>> <http://www.mapr.com/>
> >>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>>>>>
> >>>>>>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Julien
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Julien
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

You’re hitting the grammar ambiguity I expected.

I think that base Calcite should require the full verbose syntax: the TABLE keyword for table functions and the EXTEND keyword for extends clauses. Then Drill can override to make TABLE optional, and Phoenix can override to make EXTEND optional.

Are you changing the parser in your forked copy of Calcite, or are you changing Drill’s extensions to that parser?

If the former, you (or I) should add extension points to Calcite’s parser make the TABLE keyword optional and to make the EXTEND keyword optional. No project should enable both extension points — otherwise they’ll end up with an ambiguous grammar. If you agree create a Calcite JIRA case for this.

Julian

 
> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Hi,
> I've been trying to enable this but it looks like in the current grammar
> (before my change) you can not use table functions and EXTEND together.
> That's because they are on difference branches of an | in the grammar.
> So I would suggest that we treat those as two separate improvement in two
> different pull requests:
> - not require table(...) to call table functions
> - allow using table functions and extend together.
> Does it make sense?
> Julien
> 
> 
> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> To be clear, it should be possible to use a table function with all of
>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>> aliases, TABLESAMPLE.
>> 
>> I'm surprised that the parser didn't need more lookahead to choose
>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>> 
>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> In the patch I just sent, probably not.
>>> I will adjust it and add the corresponding test.
>>> 
>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> Can you use both together? Say
>>>> 
>>>>  select columns
>>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>> EXTEND
>>>> (foo INTEGER)
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> 
>>>>> I took a stab at adding the TableFunction syntax without table(...) in
>>>>> Calcite.
>>>>> I have verified that both the table function and extend (with or
>> without
>>>>> keyword) work
>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>> 
>>>>> These work:
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>> fieldDelimiter
>>>> =>
>>>>> '|')
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>> fieldDelimiter => '|'))
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>> 
>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>> 
>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>> we
>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>> 
>>>>>> 
>>>>>> That being said, I think we could make these work together... maybe.
>>>>>> 
>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>> 
>>>>>> Create a new sub block in the table block that requires no keyword.
>>>> There
>>>>>> would be two paths (and would probably require some lookahead)
>>>>>> 
>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>> golfHandicap INTEGER)
>>>>>> 
>>>>>> Then we create a table function with options 1 & 2, an EXTENDS clause
>>>> for
>>>>>> option 3.
>>>>>> 
>>>>>> Best of both worlds?
>>>>>> 
>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Phoenix already supports columns at read-time using the syntax
>> without
>>>>>> the
>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>  WHERE goldHandicap < 10;
>>>>>>> 
>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>> backward
>>>>>>> compatibility problem.
>>>>>>> 
>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>> Phoenix
>>>>>>> given our Drillix initiative.
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>> grammar,
>>>>>>>> not something later.
>>>>>>>> 
>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>> only
>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>> should
>>>>>>> be
>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>> (the
>>>>>>> same
>>>>>>>> way we plug in external things in Drill).
>>>>>>>> 
>>>>>>>> Trying to get every project to agree on extensions seems like it
>> may
>>>> be
>>>>>>>> hard.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jacques Nadeau
>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>> 
>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>> later.
>>>>>>>>> 
>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>> avoiding
>>>> a
>>>>>>>>> tough design discussion.
>>>>>>>>> 
>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>>>>>> they
>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>> Phoenix
>>>>>> (by
>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>> SQL
>>>>>>> syntax
>>>>>>>>> should be.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>> appears
>>>>>>>>> more
>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>> jacques@dremio.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>> we
>>>>>>> could
>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>> seems
>>>>>>>>>>> reasonable to support the table functions without the TABLE
>> syntax.
>>>>>>>>>>> 
>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>> confusing
>>>>>> to
>>>>>>>>> use,
>>>>>>>>>>> especially in the example that we're looking to support, such
>> as:
>>>>>>>>>>> 
>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>> fieldDelimiter
>>>>>>> =>
>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>> 
>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>> 
>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>> fieldDelimiter
>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>> 
>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>> goal).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>>>>>> do.
>>>>>>>>> It is
>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>> standard
>>>>>>>>> an
>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>> 
>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>> clash
>>>>>>>>> with
>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>>>>>> EXTENDS
>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>> Emp
>>>>>>> table
>>>>>>>>>>> but
>>>>>>>>>>>> you would like to use them in this particular query. We chose
>> to
>>>>>>> make
>>>>>>>>> the
>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>> 
>>>>>>>>>>>> so we would require
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>> 
>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>> forms
>>>>>>>>> like
>>>>>>>>>>>> this:
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>> 
>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>> should
>>>>>>>>>>> also
>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>> 
>>>>>>>>>>>> Julian
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>> there's
>>>>>>> no
>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>> calling
>>>>>>> table
>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>> DB2
>>>>>>> [4]
>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>> generation
>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>> MySQL
>>>>>>> like
>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>> [3]
>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>> [4]
>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>> [5]
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>> function
>>>>>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>> called
>>>>>>>>> more
>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>> want
>>>>>>> to
>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Julien
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>> Calcite
>>>>>>>>> when
>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>> now
>>>>>>> just
>>>>>>>>>>>> being
>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>> are
>>>>>>> the
>>>>>>>>>>> way
>>>>>>>>>>>> to
>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>> syntax
>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>> similar.
>>>>>>> Also,
>>>>>>>>>>> if
>>>>>>>>>>>> f
>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>> this?:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>> standards
>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>> disambiguate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>> used
>>>>>>> in
>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>>>>>> get
>>>>>>> a
>>>>>>>>>>> NPE
>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>> a
>>>>>>> JIRA
>>>>>>>>>>> case
>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>>>>>> arguments.
>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>> functions.)
>>>>>>>>> The
>>>>>>>>>>>> fix
>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Julien
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> *Jim Scott*
>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>> 
>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>> 
>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

You’re hitting the grammar ambiguity I expected.

I think that base Calcite should require the full verbose syntax: the TABLE keyword for table functions and the EXTEND keyword for extends clauses. Then Drill can override to make TABLE optional, and Phoenix can override to make EXTEND optional.

Are you changing the parser in your forked copy of Calcite, or are you changing Drill’s extensions to that parser?

If the former, you (or I) should add extension points to Calcite’s parser make the TABLE keyword optional and to make the EXTEND keyword optional. No project should enable both extension points — otherwise they’ll end up with an ambiguous grammar. If you agree create a Calcite JIRA case for this.

Julian

 
> On Nov 11, 2015, at 1:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Hi,
> I've been trying to enable this but it looks like in the current grammar
> (before my change) you can not use table functions and EXTEND together.
> That's because they are on difference branches of an | in the grammar.
> So I would suggest that we treat those as two separate improvement in two
> different pull requests:
> - not require table(...) to call table functions
> - allow using table functions and extend together.
> Does it make sense?
> Julien
> 
> 
> On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> To be clear, it should be possible to use a table function with all of
>> the options -- EXTENDS clause, OVER clause, AS with alias and column
>> aliases, TABLESAMPLE.
>> 
>> I'm surprised that the parser didn't need more lookahead to choose
>> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>> 
>> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> In the patch I just sent, probably not.
>>> I will adjust it and add the corresponding test.
>>> 
>>> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> Can you use both together? Say
>>>> 
>>>>  select columns
>>>>  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
>> EXTEND
>>>> (foo INTEGER)
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>>> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>>>>> 
>>>>> I took a stab at adding the TableFunction syntax without table(...) in
>>>>> Calcite.
>>>>> I have verified that both the table function and extend (with or
>> without
>>>>> keyword) work
>>>>> 
>>>> 
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>>>>> 
>>>>> These work:
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'TEXT',
>> fieldDelimiter
>>>> =>
>>>>> '|')
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>>>>> fieldDelimiter => '|'))
>>>>> 
>>>>> select columns from table(dfs.`/path/to/myfile`('JSON'))
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`('JSON')
>>>>> 
>>>>> select columns from dfs.`/path/to/myfile`(type => 'JSON')
>>>>> 
>>>>> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>> 
>>>>>> Drill does implicitly what Phoenix does explicitly so I don't think
>> we
>>>>>> should constrain ourselves to having a union of the two syntaxes.
>>>>>> 
>>>>>> 
>>>>>> That being said, I think we could make these work together... maybe.
>>>>>> 
>>>>>> Remove the EXTENDS without keyword syntax from the grammar.
>>>>>> 
>>>>>> Create a new sub block in the table block that requires no keyword.
>>>> There
>>>>>> would be two paths (and would probably require some lookahead)
>>>>>> 
>>>>>> option 1> unnamed parameters (1,2,3)
>>>>>> option 2> named parameters (a => 1, b=>2, c=> 3)
>>>>>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>>>>>> golfHandicap INTEGER)
>>>>>> 
>>>>>> Then we create a table function with options 1 & 2, an EXTENDS clause
>>>> for
>>>>>> option 3.
>>>>>> 
>>>>>> Best of both worlds?
>>>>>> 
>>>>>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Phoenix already supports columns at read-time using the syntax
>> without
>>>>>> the
>>>>>>> EXTENDS keyword as Julian indicated:
>>>>>>>  SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>  WHERE goldHandicap < 10;
>>>>>>> 
>>>>>>> Changing this by requiring the EXTENDS keyword would create a
>> backward
>>>>>>> compatibility problem.
>>>>>>> 
>>>>>>> I think it'd be good if both of these extensions worked in Drill &
>>>>>> Phoenix
>>>>>>> given our Drillix initiative.
>>>>>>> 
>>>>>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> My proposal was an a or b using the freemarker template in the
>>>> grammar,
>>>>>>>> not something later.
>>>>>>>> 
>>>>>>>> Actually, put another way: we may want to consider stating that we
>>>> only
>>>>>>>> incorporate SQL standards in our primary grammar. Any extensions
>>>> should
>>>>>>> be
>>>>>>>> optional grammar. We could simply have grammar plugins in Calcite
>> (the
>>>>>>> same
>>>>>>>> way we plug in external things in Drill).
>>>>>>>> 
>>>>>>>> Trying to get every project to agree on extensions seems like it
>> may
>>>> be
>>>>>>>> hard.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jacques Nadeau
>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>> 
>>>>>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>>> I can see why Jacques wants this syntax.
>>>>>>>>> 
>>>>>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>>>>>> predictable. Any variation should happen at validation time, or
>>>> later.
>>>>>>>>> 
>>>>>>>>> Also, we shouldn’t add configuration parameters as a way of
>> avoiding
>>>> a
>>>>>>>>> tough design discussion.
>>>>>>>>> 
>>>>>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>>>>>> they
>>>>>>>>> are both applicable to Drill and Phoenix. I think Drill and
>> Phoenix
>>>>>> (by
>>>>>>>>> which I mean Jacques and James, I guess) need to agree what the
>> SQL
>>>>>>> syntax
>>>>>>>>> should be.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Looking at those two examples I agree with Jacques. The first
>>>>>> appears
>>>>>>>>> more
>>>>>>>>>> like a hint from the syntactic sugar point of view.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
>> jacques@dremio.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Since EXTEND is custom functionality, it seems reasonable that
>> we
>>>>>>> could
>>>>>>>>>>> have a switch. Given that SQL Server and Postgres support it
>> seems
>>>>>>>>>>> reasonable to support the table functions without the TABLE
>> syntax.
>>>>>>>>>>> 
>>>>>>>>>>> I for one definitely think the TABLE syntax is much more
>> confusing
>>>>>> to
>>>>>>>>> use,
>>>>>>>>>>> especially in the example that we're looking to support, such
>> as:
>>>>>>>>>>> 
>>>>>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>> fieldDelimiter
>>>>>>> =>
>>>>>>>>>>> '|', skipFirstRow => true)
>>>>>>>>>>> 
>>>>>>>>>>> This seems much clearer than:
>>>>>>>>>>> 
>>>>>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>>>>>> fieldDelimiter
>>>>>>>>>>> => '|', skipFirstRow => true))
>>>>>>>>>>> 
>>>>>>>>>>> It also looks much more like a hint to the table (which is our
>>>>>> goal).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>>>>>> do.
>>>>>>>>> It is
>>>>>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>>>>>> standard
>>>>>>>>> an
>>>>>>>>>>>> make the TABLE keyword optional.
>>>>>>>>>>>> 
>>>>>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>>>>>> clash
>>>>>>>>> with
>>>>>>>>>>>> one of our own (few) extensions. In
>>>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>>>>>> EXTENDS
>>>>>>>>>>>> clause. You can write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
>> INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> to tell Calcite that there are two undeclared columns in the
>> Emp
>>>>>>> table
>>>>>>>>>>> but
>>>>>>>>>>>> you would like to use them in this particular query. We chose
>> to
>>>>>>> make
>>>>>>>>> the
>>>>>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> That is uncomfortably close to
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>>>>>> 
>>>>>>>>>>>> so we would require
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>>>>>> 
>>>>>>>>>>>> if EmpFunction was a table-function. You could combine the two
>>>>>> forms
>>>>>>>>> like
>>>>>>>>>>>> this:
>>>>>>>>>>>> 
>>>>>>>>>>>> SELECT *
>>>>>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>>>>>> 
>>>>>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>>>>>> should
>>>>>>>>>>> also
>>>>>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>>>>>> 
>>>>>>>>>>>> Julian
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Table function syntax: I did a quick search and it seems
>>>>>> there's
>>>>>>> no
>>>>>>>>>>>>> consensus about this.
>>>>>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
>> calling
>>>>>>> table
>>>>>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
>> DB2
>>>>>>> [4]
>>>>>>>>>>>>> expect it.
>>>>>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>>>>>> generation
>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
>> MySQL
>>>>>>> like
>>>>>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>>>>>> [3]
>>>>>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>>>>>> [4]
>>>>>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>>>>>> [5]
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
>> function
>>>>>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>>>>>> But that seems too easy to be true. Possibly this method is
>>>>>> called
>>>>>>>>> more
>>>>>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> FYI this would happen only when using named parameter. We do
>> want
>>>>>>> to
>>>>>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Julien
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>>>>>> Calcite
>>>>>>>>> when
>>>>>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes; see below.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
>> now
>>>>>>> just
>>>>>>>>>>>> being
>>>>>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
>> are
>>>>>>> the
>>>>>>>>>>> way
>>>>>>>>>>>> to
>>>>>>>>>>>>>> go.
>>>>>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
>> syntax
>>>>>>>>>>>>>> necessary?*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Consider:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
>> similar.
>>>>>>> Also,
>>>>>>>>>>> if
>>>>>>>>>>>> f
>>>>>>>>>>>>>> is a function with zero arguments, could you invoke it like
>>>>>> this?:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> select * from f
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>>>>>> standards
>>>>>>>>>>>>>> people in their wisdom decided to add a keyword to
>> disambiguate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
>> used
>>>>>>> in
>>>>>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>>>>>> get
>>>>>>> a
>>>>>>>>>>> NPE
>>>>>>>>>>>>>> later on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
>> a
>>>>>>> JIRA
>>>>>>>>>>> case
>>>>>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>>>>>> arguments.
>>>>>>>>>>>>>> (It probably applies to all functions, not just table
>>>>>> functions.)
>>>>>>>>> The
>>>>>>>>>>>> fix
>>>>>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Julien
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> *Jim Scott*
>>>>>>>>>> Director, Enterprise Strategy & Architecture
>>>>>>>>>> +1 (347) 746-9281
>>>>>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>>>>>> 
>>>>>>>>>> <http://www.mapr.com/>
>>>>>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>>>>>> 
>>>>>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Julien
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Hi,
I've been trying to enable this but it looks like in the current grammar
(before my change) you can not use table functions and EXTEND together.
That's because they are on difference branches of an | in the grammar.
So I would suggest that we treat those as two separate improvement in two
different pull requests:
 - not require table(...) to call table functions
 - allow using table functions and extend together.
Does it make sense?
Julien


On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde <jh...@apache.org> wrote:

> To be clear, it should be possible to use a table function with all of
> the options -- EXTENDS clause, OVER clause, AS with alias and column
> aliases, TABLESAMPLE.
>
> I'm surprised that the parser didn't need more lookahead to choose
> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>
> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > In the patch I just sent, probably not.
> > I will adjust it and add the corresponding test.
> >
> > On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> Can you use both together? Say
> >>
> >>   select columns
> >>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
> EXTEND
> >> (foo INTEGER)
> >>
> >> Julian
> >>
> >>
> >>
> >> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >> >
> >> > I took a stab at adding the TableFunction syntax without table(...) in
> >> > Calcite.
> >> > I have verified that both the table function and extend (with or
> without
> >> > keyword) work
> >> >
> >>
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >> >
> >> > These work:
> >> >
> >> > select columns from dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter
> >> =>
> >> > '|')
> >> >
> >> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> >> > fieldDelimiter => '|'))
> >> >
> >> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >> >
> >> > select columns from dfs.`/path/to/myfile`('JSON')
> >> >
> >> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >> >
> >> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >> >
> >> >> Drill does implicitly what Phoenix does explicitly so I don't think
> we
> >> >> should constrain ourselves to having a union of the two syntaxes.
> >> >>
> >> >>
> >> >> That being said, I think we could make these work together... maybe.
> >> >>
> >> >> Remove the EXTENDS without keyword syntax from the grammar.
> >> >>
> >> >> Create a new sub block in the table block that requires no keyword.
> >> There
> >> >> would be two paths (and would probably require some lookahead)
> >> >>
> >> >> option 1> unnamed parameters (1,2,3)
> >> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> >> golfHandicap INTEGER)
> >> >>
> >> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> >> for
> >> >> option 3.
> >> >>
> >> >> Best of both worlds?
> >> >>
> >> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <jamestaylor@apache.org
> >
> >> >> wrote:
> >> >>
> >> >>> Phoenix already supports columns at read-time using the syntax
> without
> >> >> the
> >> >>> EXTENDS keyword as Julian indicated:
> >> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
> INTEGER)
> >> >>>   WHERE goldHandicap < 10;
> >> >>>
> >> >>> Changing this by requiring the EXTENDS keyword would create a
> backward
> >> >>> compatibility problem.
> >> >>>
> >> >>> I think it'd be good if both of these extensions worked in Drill &
> >> >> Phoenix
> >> >>> given our Drillix initiative.
> >> >>>
> >> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> >> >> wrote:
> >> >>>
> >> >>>> My proposal was an a or b using the freemarker template in the
> >> grammar,
> >> >>>> not something later.
> >> >>>>
> >> >>>> Actually, put another way: we may want to consider stating that we
> >> only
> >> >>>> incorporate SQL standards in our primary grammar. Any extensions
> >> should
> >> >>> be
> >> >>>> optional grammar. We could simply have grammar plugins in Calcite
> (the
> >> >>> same
> >> >>>> way we plug in external things in Drill).
> >> >>>>
> >> >>>> Trying to get every project to agree on extensions seems like it
> may
> >> be
> >> >>>> hard.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Jacques Nadeau
> >> >>>> CTO and Co-Founder, Dremio
> >> >>>>
> >> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >> >>>>
> >> >>>>> I can see why Jacques wants this syntax.
> >> >>>>>
> >> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
> >> >>>>> predictable. Any variation should happen at validation time, or
> >> later.
> >> >>>>>
> >> >>>>> Also, we shouldn’t add configuration parameters as a way of
> avoiding
> >> a
> >> >>>>> tough design discussion.
> >> >>>>>
> >> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> >> they
> >> >>>>> are both applicable to Drill and Phoenix. I think Drill and
> Phoenix
> >> >> (by
> >> >>>>> which I mean Jacques and James, I guess) need to agree what the
> SQL
> >> >>> syntax
> >> >>>>> should be.
> >> >>>>>
> >> >>>>> Julian
> >> >>>>>
> >> >>>>>
> >> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com>
> wrote:
> >> >>>>>>
> >> >>>>>> Looking at those two examples I agree with Jacques. The first
> >> >> appears
> >> >>>>> more
> >> >>>>>> like a hint from the syntactic sugar point of view.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <
> jacques@dremio.com
> >> >>>
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that
> we
> >> >>> could
> >> >>>>>>> have a switch. Given that SQL Server and Postgres support it
> seems
> >> >>>>>>> reasonable to support the table functions without the TABLE
> syntax.
> >> >>>>>>>
> >> >>>>>>> I for one definitely think the TABLE syntax is much more
> confusing
> >> >> to
> >> >>>>> use,
> >> >>>>>>> especially in the example that we're looking to support, such
> as:
> >> >>>>>>>
> >> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >> >> fieldDelimiter
> >> >>> =>
> >> >>>>>>> '|', skipFirstRow => true)
> >> >>>>>>>
> >> >>>>>>> This seems much clearer than:
> >> >>>>>>>
> >> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >> >>>>> fieldDelimiter
> >> >>>>>>> => '|', skipFirstRow => true))
> >> >>>>>>>
> >> >>>>>>> It also looks much more like a hint to the table (which is our
> >> >> goal).
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> Jacques Nadeau
> >> >>>>>>> CTO and Co-Founder, Dremio
> >> >>>>>>>
> >> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >> >>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
> >> >> do.
> >> >>>>> It is
> >> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >> >>> standard
> >> >>>>> an
> >> >>>>>>>> make the TABLE keyword optional.
> >> >>>>>>>>
> >> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >> >> clash
> >> >>>>> with
> >> >>>>>>>> one of our own (few) extensions. In
> >> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >> >>>>> EXTENDS
> >> >>>>>>>> clause. You can write
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap
> INTEGER)
> >> >>>>>>>> WHERE goldHandicap < 10;
> >> >>>>>>>>
> >> >>>>>>>> to tell Calcite that there are two undeclared columns in the
> Emp
> >> >>> table
> >> >>>>>>> but
> >> >>>>>>>> you would like to use them in this particular query. We chose
> to
> >> >>> make
> >> >>>>> the
> >> >>>>>>>> EXTEND keyword optional, so you could instead write
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>>>>>> WHERE goldHandicap < 10;
> >> >>>>>>>>
> >> >>>>>>>> That is uncomfortably close to
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >> >>>>>>>>
> >> >>>>>>>> so we would require
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >> >>>>>>>>
> >> >>>>>>>> if EmpFunction was a table-function. You could combine the two
> >> >> forms
> >> >>>>> like
> >> >>>>>>>> this:
> >> >>>>>>>>
> >> >>>>>>>> SELECT *
> >> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >> >>>>>>>> (anotherAttribute INTEGER);
> >> >>>>>>>>
> >> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >> >>> should
> >> >>>>>>> also
> >> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >> >>>>>>>>
> >> >>>>>>>> Julian
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> >> >>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>> - Table function syntax: I did a quick search and it seems
> >> >> there's
> >> >>> no
> >> >>>>>>>>> consensus about this.
> >> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow
> calling
> >> >>> table
> >> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and
> DB2
> >> >>> [4]
> >> >>>>>>>>> expect it.
> >> >>>>>>>>> MySQL does not have table functions [5]
> >> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >> >> generation
> >> >>> to
> >> >>>>>>>> have
> >> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the
> MySQL
> >> >>> like
> >> >>>>>>>>> syntax (back ticks for identifiers etc)
> >> >>>>>>>>>
> >> >>>>>>>>> [1]
> >> >>>>>>>
> >> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >> >>>>>>>>> [2]
> >> >>>>>>>
> >> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >> >>>>>>>>> [3]
> >> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >> >>>>>>>>> [4]
> >> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >> >>>>>>>>> [5]
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>
> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >> >>>>>>>>>
> >> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the
> function
> >> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
> >> >>>>>>>>> But that seems too easy to be true. Possibly this method is
> >> >> called
> >> >>>>> more
> >> >>>>>>>>> than once (before and after the function has been resolved?)
> >> >>>>>>>>>
> >> >>>>>>>>> FYI this would happen only when using named parameter. We do
> want
> >> >>> to
> >> >>>>>>>>> overload in this case, which is why I'm looking into it.
> >> >>>>>>>>>
> >> >>>>>>>>> I'll fill a JIRA for my other branch
> >> >>>>>>>>>
> >> >>>>>>>>> Julien
> >> >>>>>>>>>
> >> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jhyde@apache.org
> >
> >> >>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <julien@dremio.com
> >
> >> >>>>> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >> >> Calcite
> >> >>>>> when
> >> >>>>>>>>>> there's more than 1 function with the same name.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Yes; see below.
> >> >>>>>>>>>>
> >> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For
> now
> >> >>> just
> >> >>>>>>>> being
> >> >>>>>>>>>> able to specify the delimiter for csv files.
> >> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros
> are
> >> >>> the
> >> >>>>>>> way
> >> >>>>>>>> to
> >> >>>>>>>>>> go.
> >> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping
> syntax
> >> >>>>>>>>>> necessary?*
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Consider:
> >> >>>>>>>>>>
> >> >>>>>>>>>> select * from myTable as f(x, y)
> >> >>>>>>>>>> select * from myTable f(x, y)
> >> >>>>>>>>>> select * from myFunction(x, y)
> >> >>>>>>>>>>
> >> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully
> similar.
> >> >>> Also,
> >> >>>>>>> if
> >> >>>>>>>> f
> >> >>>>>>>>>> is a function with zero arguments, could you invoke it like
> >> >> this?:
> >> >>>>>>>>>>
> >> >>>>>>>>>> select * from f
> >> >>>>>>>>>>
> >> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >> >>> standards
> >> >>>>>>>>>> people in their wisdom decided to add a keyword to
> disambiguate.
> >> >>>>>>>>>>
> >> >>>>>>>>>> I had to fix some things in Calcite to enable this:
> >> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be
> used
> >> >>> in
> >> >>>>>>>>>> Calcite for the Maze example.
> >> >>>>>>>>>> Which is why some hooks were missing.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Can you log a jira case to track this bug?
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >> >>>>>>>>>> Here is a test that reproduces the problem:
> >> >>>>>>>>>> https://github.com/apache/calcite/pull/166
> >> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
> >> >> get
> >> >>> a
> >> >>>>>>> NPE
> >> >>>>>>>>>> later on.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log
> a
> >> >>> JIRA
> >> >>>>>>> case
> >> >>>>>>>>>> on resolution of overloaded functions when invoked with named
> >> >>>>>>> arguments.
> >> >>>>>>>>>> (It probably applies to all functions, not just table
> >> >> functions.)
> >> >>>>> The
> >> >>>>>>>> fix
> >> >>>>>>>>>> will take a while (if you wait for me to write it).
> >> >>>>>>>>>>
> >> >>>>>>>>>> For now please tell your users not to overload. :)
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Julian
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Julien
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> *Jim Scott*
> >> >>>>>> Director, Enterprise Strategy & Architecture
> >> >>>>>> +1 (347) 746-9281
> >> >>>>>> @kingmesal <https://twitter.com/kingmesal>
> >> >>>>>>
> >> >>>>>> <http://www.mapr.com/>
> >> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >> >>>>>>
> >> >>>>>> Now Available - Free Hadoop On-Demand Training
> >> >>>>>> <
> >> >>>>>
> >> >>>
> >> >>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Julien
> >>
> >>
> >
> >
> > --
> > Julien
>



-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

To be clear, it should be possible to use a table function with all of
the options -- EXTENDS clause, OVER clause, AS with alias and column
aliases, TABLESAMPLE.

I'm surprised that the parser didn't need more lookahead to choose
between 't (x, y)' and 't (x INTEGER, y DATE)'.

On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
> In the patch I just sent, probably not.
> I will adjust it and add the corresponding test.
>
> On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> Can you use both together? Say
>>
>>   select columns
>>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
>> (foo INTEGER)
>>
>> Julian
>>
>>
>>
>> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
>> >
>> > I took a stab at adding the TableFunction syntax without table(...) in
>> > Calcite.
>> > I have verified that both the table function and extend (with or without
>> > keyword) work
>> >
>> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
>> >
>> > These work:
>> >
>> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
>> =>
>> > '|')
>> >
>> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
>> > fieldDelimiter => '|'))
>> >
>> > select columns from table(dfs.`/path/to/myfile`('JSON'))
>> >
>> > select columns from dfs.`/path/to/myfile`('JSON')
>> >
>> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
>> >
>> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>> >
>> >> Drill does implicitly what Phoenix does explicitly so I don't think we
>> >> should constrain ourselves to having a union of the two syntaxes.
>> >>
>> >>
>> >> That being said, I think we could make these work together... maybe.
>> >>
>> >> Remove the EXTENDS without keyword syntax from the grammar.
>> >>
>> >> Create a new sub block in the table block that requires no keyword.
>> There
>> >> would be two paths (and would probably require some lookahead)
>> >>
>> >> option 1> unnamed parameters (1,2,3)
>> >> option 2> named parameters (a => 1, b=>2, c=> 3)
>> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> >> golfHandicap INTEGER)
>> >>
>> >> Then we create a table function with options 1 & 2, an EXTENDS clause
>> for
>> >> option 3.
>> >>
>> >> Best of both worlds?
>> >>
>> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
>> >> wrote:
>> >>
>> >>> Phoenix already supports columns at read-time using the syntax without
>> >> the
>> >>> EXTENDS keyword as Julian indicated:
>> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>   WHERE goldHandicap < 10;
>> >>>
>> >>> Changing this by requiring the EXTENDS keyword would create a backward
>> >>> compatibility problem.
>> >>>
>> >>> I think it'd be good if both of these extensions worked in Drill &
>> >> Phoenix
>> >>> given our Drillix initiative.
>> >>>
>> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>> >> wrote:
>> >>>
>> >>>> My proposal was an a or b using the freemarker template in the
>> grammar,
>> >>>> not something later.
>> >>>>
>> >>>> Actually, put another way: we may want to consider stating that we
>> only
>> >>>> incorporate SQL standards in our primary grammar. Any extensions
>> should
>> >>> be
>> >>>> optional grammar. We could simply have grammar plugins in Calcite (the
>> >>> same
>> >>>> way we plug in external things in Drill).
>> >>>>
>> >>>> Trying to get every project to agree on extensions seems like it may
>> be
>> >>>> hard.
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Jacques Nadeau
>> >>>> CTO and Co-Founder, Dremio
>> >>>>
>> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>> >>>>
>> >>>>> I can see why Jacques wants this syntax.
>> >>>>>
>> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>> >>>>> predictable. Any variation should happen at validation time, or
>> later.
>> >>>>>
>> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
>> a
>> >>>>> tough design discussion.
>> >>>>>
>> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>> >> they
>> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
>> >> (by
>> >>>>> which I mean Jacques and James, I guess) need to agree what the SQL
>> >>> syntax
>> >>>>> should be.
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>> >>>>>>
>> >>>>>> Looking at those two examples I agree with Jacques. The first
>> >> appears
>> >>>>> more
>> >>>>>> like a hint from the syntactic sugar point of view.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>> >>>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
>> >>> could
>> >>>>>>> have a switch. Given that SQL Server and Postgres support it seems
>> >>>>>>> reasonable to support the table functions without the TABLE syntax.
>> >>>>>>>
>> >>>>>>> I for one definitely think the TABLE syntax is much more confusing
>> >> to
>> >>>>> use,
>> >>>>>>> especially in the example that we're looking to support, such as:
>> >>>>>>>
>> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> >> fieldDelimiter
>> >>> =>
>> >>>>>>> '|', skipFirstRow => true)
>> >>>>>>>
>> >>>>>>> This seems much clearer than:
>> >>>>>>>
>> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>> >>>>> fieldDelimiter
>> >>>>>>> => '|', skipFirstRow => true))
>> >>>>>>>
>> >>>>>>> It also looks much more like a hint to the table (which is our
>> >> goal).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Jacques Nadeau
>> >>>>>>> CTO and Co-Founder, Dremio
>> >>>>>>>
>> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
>> >> do.
>> >>>>> It is
>> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>> >>> standard
>> >>>>> an
>> >>>>>>>> make the TABLE keyword optional.
>> >>>>>>>>
>> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
>> >> clash
>> >>>>> with
>> >>>>>>>> one of our own (few) extensions. In
>> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>> >>>>> EXTENDS
>> >>>>>>>> clause. You can write
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>>>>>> WHERE goldHandicap < 10;
>> >>>>>>>>
>> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
>> >>> table
>> >>>>>>> but
>> >>>>>>>> you would like to use them in this particular query. We chose to
>> >>> make
>> >>>>> the
>> >>>>>>>> EXTEND keyword optional, so you could instead write
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>>>>>> WHERE goldHandicap < 10;
>> >>>>>>>>
>> >>>>>>>> That is uncomfortably close to
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>>>>>>
>> >>>>>>>> so we would require
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>>>>>>
>> >>>>>>>> if EmpFunction was a table-function. You could combine the two
>> >> forms
>> >>>>> like
>> >>>>>>>> this:
>> >>>>>>>>
>> >>>>>>>> SELECT *
>> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>> >>>>>>>> (anotherAttribute INTEGER);
>> >>>>>>>>
>> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>> >>> should
>> >>>>>>> also
>> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>> >>>>>>>>
>> >>>>>>>> Julian
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>> >>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> - Table function syntax: I did a quick search and it seems
>> >> there's
>> >>> no
>> >>>>>>>>> consensus about this.
>> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
>> >>> table
>> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
>> >>> [4]
>> >>>>>>>>> expect it.
>> >>>>>>>>> MySQL does not have table functions [5]
>> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
>> >> generation
>> >>> to
>> >>>>>>>> have
>> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
>> >>> like
>> >>>>>>>>> syntax (back ticks for identifiers etc)
>> >>>>>>>>>
>> >>>>>>>>> [1]
>> >>>>>>>
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>>>>>>> [2]
>> >>>>>>>
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>>>>>>> [3]
>> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>>>>>>> [4]
>> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>>>>>>> [5]
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>>>>>>
>> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
>> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>>>>>>> But that seems too easy to be true. Possibly this method is
>> >> called
>> >>>>> more
>> >>>>>>>>> than once (before and after the function has been resolved?)
>> >>>>>>>>>
>> >>>>>>>>> FYI this would happen only when using named parameter. We do want
>> >>> to
>> >>>>>>>>> overload in this case, which is why I'm looking into it.
>> >>>>>>>>>
>> >>>>>>>>> I'll fill a JIRA for my other branch
>> >>>>>>>>>
>> >>>>>>>>> Julien
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>> >>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>> >> Calcite
>> >>>>> when
>> >>>>>>>>>> there's more than 1 function with the same name.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Yes; see below.
>> >>>>>>>>>>
>> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
>> >>> just
>> >>>>>>>> being
>> >>>>>>>>>> able to specify the delimiter for csv files.
>> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
>> >>> the
>> >>>>>>> way
>> >>>>>>>> to
>> >>>>>>>>>> go.
>> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>> >>>>>>>>>> necessary?*
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Consider:
>> >>>>>>>>>>
>> >>>>>>>>>> select * from myTable as f(x, y)
>> >>>>>>>>>> select * from myTable f(x, y)
>> >>>>>>>>>> select * from myFunction(x, y)
>> >>>>>>>>>>
>> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
>> >>> Also,
>> >>>>>>> if
>> >>>>>>>> f
>> >>>>>>>>>> is a function with zero arguments, could you invoke it like
>> >> this?:
>> >>>>>>>>>>
>> >>>>>>>>>> select * from f
>> >>>>>>>>>>
>> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>> >>> standards
>> >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
>> >>>>>>>>>>
>> >>>>>>>>>> I had to fix some things in Calcite to enable this:
>> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
>> >>> in
>> >>>>>>>>>> Calcite for the Maze example.
>> >>>>>>>>>> Which is why some hooks were missing.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Can you log a jira case to track this bug?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>> >>>>>>>>>> Here is a test that reproduces the problem:
>> >>>>>>>>>> https://github.com/apache/calcite/pull/166
>> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>> >> get
>> >>> a
>> >>>>>>> NPE
>> >>>>>>>>>> later on.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
>> >>> JIRA
>> >>>>>>> case
>> >>>>>>>>>> on resolution of overloaded functions when invoked with named
>> >>>>>>> arguments.
>> >>>>>>>>>> (It probably applies to all functions, not just table
>> >> functions.)
>> >>>>> The
>> >>>>>>>> fix
>> >>>>>>>>>> will take a while (if you wait for me to write it).
>> >>>>>>>>>>
>> >>>>>>>>>> For now please tell your users not to overload. :)
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Julian
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Julien
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Jim Scott*
>> >>>>>> Director, Enterprise Strategy & Architecture
>> >>>>>> +1 (347) 746-9281
>> >>>>>> @kingmesal <https://twitter.com/kingmesal>
>> >>>>>>
>> >>>>>> <http://www.mapr.com/>
>> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
>> >>>>>>
>> >>>>>> Now Available - Free Hadoop On-Demand Training
>> >>>>>> <
>> >>>>>
>> >>>
>> >>
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Julien
>>
>>
>
>
> --
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

In the patch I just sent, probably not.
I will adjust it and add the corresponding test.

On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:

> Can you use both together? Say
>
>   select columns
>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
> (foo INTEGER)
>
> Julian
>
>
>
> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > I took a stab at adding the TableFunction syntax without table(...) in
> > Calcite.
> > I have verified that both the table function and extend (with or without
> > keyword) work
> >
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >
> > These work:
> >
> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
> =>
> > '|')
> >
> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> > fieldDelimiter => '|'))
> >
> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >
> > select columns from dfs.`/path/to/myfile`('JSON')
> >
> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >
> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> >> Drill does implicitly what Phoenix does explicitly so I don't think we
> >> should constrain ourselves to having a union of the two syntaxes.
> >>
> >>
> >> That being said, I think we could make these work together... maybe.
> >>
> >> Remove the EXTENDS without keyword syntax from the grammar.
> >>
> >> Create a new sub block in the table block that requires no keyword.
> There
> >> would be two paths (and would probably require some lookahead)
> >>
> >> option 1> unnamed parameters (1,2,3)
> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> golfHandicap INTEGER)
> >>
> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> for
> >> option 3.
> >>
> >> Best of both worlds?
> >>
> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> >> wrote:
> >>
> >>> Phoenix already supports columns at read-time using the syntax without
> >> the
> >>> EXTENDS keyword as Julian indicated:
> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>   WHERE goldHandicap < 10;
> >>>
> >>> Changing this by requiring the EXTENDS keyword would create a backward
> >>> compatibility problem.
> >>>
> >>> I think it'd be good if both of these extensions worked in Drill &
> >> Phoenix
> >>> given our Drillix initiative.
> >>>
> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >>>
> >>>> My proposal was an a or b using the freemarker template in the
> grammar,
> >>>> not something later.
> >>>>
> >>>> Actually, put another way: we may want to consider stating that we
> only
> >>>> incorporate SQL standards in our primary grammar. Any extensions
> should
> >>> be
> >>>> optional grammar. We could simply have grammar plugins in Calcite (the
> >>> same
> >>>> way we plug in external things in Drill).
> >>>>
> >>>> Trying to get every project to agree on extensions seems like it may
> be
> >>>> hard.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jacques Nadeau
> >>>> CTO and Co-Founder, Dremio
> >>>>
> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>> I can see why Jacques wants this syntax.
> >>>>>
> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
> >>>>> predictable. Any variation should happen at validation time, or
> later.
> >>>>>
> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
> a
> >>>>> tough design discussion.
> >>>>>
> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> they
> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> >> (by
> >>>>> which I mean Jacques and James, I guess) need to agree what the SQL
> >>> syntax
> >>>>> should be.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >>>>>>
> >>>>>> Looking at those two examples I agree with Jacques. The first
> >> appears
> >>>>> more
> >>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
> >>> could
> >>>>>>> have a switch. Given that SQL Server and Postgres support it seems
> >>>>>>> reasonable to support the table functions without the TABLE syntax.
> >>>>>>>
> >>>>>>> I for one definitely think the TABLE syntax is much more confusing
> >> to
> >>>>> use,
> >>>>>>> especially in the example that we're looking to support, such as:
> >>>>>>>
> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >>> =>
> >>>>>>> '|', skipFirstRow => true)
> >>>>>>>
> >>>>>>> This seems much clearer than:
> >>>>>>>
> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>> fieldDelimiter
> >>>>>>> => '|', skipFirstRow => true))
> >>>>>>>
> >>>>>>> It also looks much more like a hint to the table (which is our
> >> goal).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jacques Nadeau
> >>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
> >> do.
> >>>>> It is
> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>> standard
> >>>>> an
> >>>>>>>> make the TABLE keyword optional.
> >>>>>>>>
> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >> clash
> >>>>> with
> >>>>>>>> one of our own (few) extensions. In
> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >>>>> EXTENDS
> >>>>>>>> clause. You can write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
> >>> table
> >>>>>>> but
> >>>>>>>> you would like to use them in this particular query. We chose to
> >>> make
> >>>>> the
> >>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> That is uncomfortably close to
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>
> >>>>>>>> so we would require
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>
> >>>>>>>> if EmpFunction was a table-function. You could combine the two
> >> forms
> >>>>> like
> >>>>>>>> this:
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>
> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>> should
> >>>>>>> also
> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>
> >>>>>>>> Julian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> - Table function syntax: I did a quick search and it seems
> >> there's
> >>> no
> >>>>>>>>> consensus about this.
> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> >>> table
> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
> >>> [4]
> >>>>>>>>> expect it.
> >>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >> generation
> >>> to
> >>>>>>>> have
> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> >>> like
> >>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>
> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>> [2]
> >>>>>>>
> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>> [3]
> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>> [4]
> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>> [5]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>
> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
> >>>>>>>>> But that seems too easy to be true. Possibly this method is
> >> called
> >>>>> more
> >>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>
> >>>>>>>>> FYI this would happen only when using named parameter. We do want
> >>> to
> >>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>
> >>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>
> >>>>>>>>> Julien
> >>>>>>>>>
> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >> Calcite
> >>>>> when
> >>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes; see below.
> >>>>>>>>>>
> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> >>> just
> >>>>>>>> being
> >>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
> >>> the
> >>>>>>> way
> >>>>>>>> to
> >>>>>>>>>> go.
> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >>>>>>>>>> necessary?*
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Consider:
> >>>>>>>>>>
> >>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>
> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> >>> Also,
> >>>>>>> if
> >>>>>>>> f
> >>>>>>>>>> is a function with zero arguments, could you invoke it like
> >> this?:
> >>>>>>>>>>
> >>>>>>>>>> select * from f
> >>>>>>>>>>
> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>> standards
> >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
> >>>>>>>>>>
> >>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> >>> in
> >>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
> >> get
> >>> a
> >>>>>>> NPE
> >>>>>>>>>> later on.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
> >>> JIRA
> >>>>>>> case
> >>>>>>>>>> on resolution of overloaded functions when invoked with named
> >>>>>>> arguments.
> >>>>>>>>>> (It probably applies to all functions, not just table
> >> functions.)
> >>>>> The
> >>>>>>>> fix
> >>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>
> >>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Julian
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Julien
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Jim Scott*
> >>>>>> Director, Enterprise Strategy & Architecture
> >>>>>> +1 (347) 746-9281
> >>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>
> >>>>>> <http://www.mapr.com/>
> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>
> >>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>> <
> >>>>>
> >>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

In the patch I just sent, probably not.
I will adjust it and add the corresponding test.

On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde <jh...@apache.org> wrote:

> Can you use both together? Say
>
>   select columns
>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND
> (foo INTEGER)
>
> Julian
>
>
>
> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > I took a stab at adding the TableFunction syntax without table(...) in
> > Calcite.
> > I have verified that both the table function and extend (with or without
> > keyword) work
> >
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >
> > These work:
> >
> > select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter
> =>
> > '|')
> >
> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> > fieldDelimiter => '|'))
> >
> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >
> > select columns from dfs.`/path/to/myfile`('JSON')
> >
> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >
> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> >> Drill does implicitly what Phoenix does explicitly so I don't think we
> >> should constrain ourselves to having a union of the two syntaxes.
> >>
> >>
> >> That being said, I think we could make these work together... maybe.
> >>
> >> Remove the EXTENDS without keyword syntax from the grammar.
> >>
> >> Create a new sub block in the table block that requires no keyword.
> There
> >> would be two paths (and would probably require some lookahead)
> >>
> >> option 1> unnamed parameters (1,2,3)
> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> golfHandicap INTEGER)
> >>
> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> for
> >> option 3.
> >>
> >> Best of both worlds?
> >>
> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> >> wrote:
> >>
> >>> Phoenix already supports columns at read-time using the syntax without
> >> the
> >>> EXTENDS keyword as Julian indicated:
> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>   WHERE goldHandicap < 10;
> >>>
> >>> Changing this by requiring the EXTENDS keyword would create a backward
> >>> compatibility problem.
> >>>
> >>> I think it'd be good if both of these extensions worked in Drill &
> >> Phoenix
> >>> given our Drillix initiative.
> >>>
> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >>>
> >>>> My proposal was an a or b using the freemarker template in the
> grammar,
> >>>> not something later.
> >>>>
> >>>> Actually, put another way: we may want to consider stating that we
> only
> >>>> incorporate SQL standards in our primary grammar. Any extensions
> should
> >>> be
> >>>> optional grammar. We could simply have grammar plugins in Calcite (the
> >>> same
> >>>> way we plug in external things in Drill).
> >>>>
> >>>> Trying to get every project to agree on extensions seems like it may
> be
> >>>> hard.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jacques Nadeau
> >>>> CTO and Co-Founder, Dremio
> >>>>
> >>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>> I can see why Jacques wants this syntax.
> >>>>>
> >>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
> >>>>> predictable. Any variation should happen at validation time, or
> later.
> >>>>>
> >>>>> Also, we shouldn’t add configuration parameters as a way of avoiding
> a
> >>>>> tough design discussion.
> >>>>>
> >>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> they
> >>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> >> (by
> >>>>> which I mean Jacques and James, I guess) need to agree what the SQL
> >>> syntax
> >>>>> should be.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >>>>>>
> >>>>>> Looking at those two examples I agree with Jacques. The first
> >> appears
> >>>>> more
> >>>>>> like a hint from the syntactic sugar point of view.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
> >>> could
> >>>>>>> have a switch. Given that SQL Server and Postgres support it seems
> >>>>>>> reasonable to support the table functions without the TABLE syntax.
> >>>>>>>
> >>>>>>> I for one definitely think the TABLE syntax is much more confusing
> >> to
> >>>>> use,
> >>>>>>> especially in the example that we're looking to support, such as:
> >>>>>>>
> >>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >>> =>
> >>>>>>> '|', skipFirstRow => true)
> >>>>>>>
> >>>>>>> This seems much clearer than:
> >>>>>>>
> >>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >>>>> fieldDelimiter
> >>>>>>> => '|', skipFirstRow => true))
> >>>>>>>
> >>>>>>> It also looks much more like a hint to the table (which is our
> >> goal).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jacques Nadeau
> >>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for doing the legwork and finding what the other vendors
> >> do.
> >>>>> It is
> >>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
> >>> standard
> >>>>> an
> >>>>>>>> make the TABLE keyword optional.
> >>>>>>>>
> >>>>>>>> I tried that syntax in Calcite and discovered that there is a
> >> clash
> >>>>> with
> >>>>>>>> one of our own (few) extensions. In
> >>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >>>>> EXTENDS
> >>>>>>>> clause. You can write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
> >>> table
> >>>>>>> but
> >>>>>>>> you would like to use them in this particular query. We chose to
> >>> make
> >>>>> the
> >>>>>>>> EXTEND keyword optional, so you could instead write
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>>>>>> WHERE goldHandicap < 10;
> >>>>>>>>
> >>>>>>>> That is uncomfortably close to
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
> >>>>>>>>
> >>>>>>>> so we would require
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>>>>>>
> >>>>>>>> if EmpFunction was a table-function. You could combine the two
> >> forms
> >>>>> like
> >>>>>>>> this:
> >>>>>>>>
> >>>>>>>> SELECT *
> >>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>>>>>>> (anotherAttribute INTEGER);
> >>>>>>>>
> >>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
> >>> should
> >>>>>>> also
> >>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
> >>>>>>>>
> >>>>>>>> Julian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> - Table function syntax: I did a quick search and it seems
> >> there's
> >>> no
> >>>>>>>>> consensus about this.
> >>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> >>> table
> >>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
> >>> [4]
> >>>>>>>>> expect it.
> >>>>>>>>> MySQL does not have table functions [5]
> >>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>>>>>>> Would it be reasonable to allow a switch in the grammar
> >> generation
> >>> to
> >>>>>>>> have
> >>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> >>> like
> >>>>>>>>> syntax (back ticks for identifiers etc)
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>
> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>>>>>>> [2]
> >>>>>>>
> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>>>>>>> [3]
> >>> https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>>>>>>> [4]
> >>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>>>>>>> [5]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>>>>>>
> >>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
> >>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
> >>>>>>>>> But that seems too easy to be true. Possibly this method is
> >> called
> >>>>> more
> >>>>>>>>> than once (before and after the function has been resolved?)
> >>>>>>>>>
> >>>>>>>>> FYI this would happen only when using named parameter. We do want
> >>> to
> >>>>>>>>> overload in this case, which is why I'm looking into it.
> >>>>>>>>>
> >>>>>>>>> I'll fill a JIRA for my other branch
> >>>>>>>>>
> >>>>>>>>> Julien
> >>>>>>>>>
> >>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
> >> Calcite
> >>>>> when
> >>>>>>>>>> there's more than 1 function with the same name.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes; see below.
> >>>>>>>>>>
> >>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> >>> just
> >>>>>>>> being
> >>>>>>>>>> able to specify the delimiter for csv files.
> >>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
> >>> the
> >>>>>>> way
> >>>>>>>> to
> >>>>>>>>>> go.
> >>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >>>>>>>>>> necessary?*
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Consider:
> >>>>>>>>>>
> >>>>>>>>>> select * from myTable as f(x, y)
> >>>>>>>>>> select * from myTable f(x, y)
> >>>>>>>>>> select * from myFunction(x, y)
> >>>>>>>>>>
> >>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> >>> Also,
> >>>>>>> if
> >>>>>>>> f
> >>>>>>>>>> is a function with zero arguments, could you invoke it like
> >> this?:
> >>>>>>>>>>
> >>>>>>>>>> select * from f
> >>>>>>>>>>
> >>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
> >>> standards
> >>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
> >>>>>>>>>>
> >>>>>>>>>> I had to fix some things in Calcite to enable this:
> >>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> >>> in
> >>>>>>>>>> Calcite for the Maze example.
> >>>>>>>>>> Which is why some hooks were missing.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Can you log a jira case to track this bug?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >>>>>>>>>> Here is a test that reproduces the problem:
> >>>>>>>>>> https://github.com/apache/calcite/pull/166
> >>>>>>>>>> If we return more than 1 TableFunction with the same name, we
> >> get
> >>> a
> >>>>>>> NPE
> >>>>>>>>>> later on.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
> >>> JIRA
> >>>>>>> case
> >>>>>>>>>> on resolution of overloaded functions when invoked with named
> >>>>>>> arguments.
> >>>>>>>>>> (It probably applies to all functions, not just table
> >> functions.)
> >>>>> The
> >>>>>>>> fix
> >>>>>>>>>> will take a while (if you wait for me to write it).
> >>>>>>>>>>
> >>>>>>>>>> For now please tell your users not to overload. :)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Julian
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Julien
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Jim Scott*
> >>>>>> Director, Enterprise Strategy & Architecture
> >>>>>> +1 (347) 746-9281
> >>>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>>
> >>>>>> <http://www.mapr.com/>
> >>>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>>
> >>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>> <
> >>>>>
> >>>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Julien
>
>


-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Can you use both together? Say

  select columns
  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND (foo INTEGER)

Julian



> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> I took a stab at adding the TableFunction syntax without table(...) in
> Calcite.
> I have verified that both the table function and extend (with or without
> keyword) work
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> 
> These work:
> 
> select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
> '|')
> 
> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter => '|'))
> 
> select columns from table(dfs.`/path/to/myfile`('JSON'))
> 
> select columns from dfs.`/path/to/myfile`('JSON')
> 
> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> 
> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
>> Drill does implicitly what Phoenix does explicitly so I don't think we
>> should constrain ourselves to having a union of the two syntaxes.
>> 
>> 
>> That being said, I think we could make these work together... maybe.
>> 
>> Remove the EXTENDS without keyword syntax from the grammar.
>> 
>> Create a new sub block in the table block that requires no keyword. There
>> would be two paths (and would probably require some lookahead)
>> 
>> option 1> unnamed parameters (1,2,3)
>> option 2> named parameters (a => 1, b=>2, c=> 3)
>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> golfHandicap INTEGER)
>> 
>> Then we create a table function with options 1 & 2, an EXTENDS clause for
>> option 3.
>> 
>> Best of both worlds?
>> 
>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
>> wrote:
>> 
>>> Phoenix already supports columns at read-time using the syntax without
>> the
>>> EXTENDS keyword as Julian indicated:
>>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>   WHERE goldHandicap < 10;
>>> 
>>> Changing this by requiring the EXTENDS keyword would create a backward
>>> compatibility problem.
>>> 
>>> I think it'd be good if both of these extensions worked in Drill &
>> Phoenix
>>> given our Drillix initiative.
>>> 
>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>> 
>>>> My proposal was an a or b using the freemarker template in the grammar,
>>>> not something later.
>>>> 
>>>> Actually, put another way: we may want to consider stating that we only
>>>> incorporate SQL standards in our primary grammar. Any extensions should
>>> be
>>>> optional grammar. We could simply have grammar plugins in Calcite (the
>>> same
>>>> way we plug in external things in Drill).
>>>> 
>>>> Trying to get every project to agree on extensions seems like it may be
>>>> hard.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> I can see why Jacques wants this syntax.
>>>>> 
>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>> predictable. Any variation should happen at validation time, or later.
>>>>> 
>>>>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>>>>> tough design discussion.
>>>>> 
>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>> they
>>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
>> (by
>>>>> which I mean Jacques and James, I guess) need to agree what the SQL
>>> syntax
>>>>> should be.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>>>>>> 
>>>>>> Looking at those two examples I agree with Jacques. The first
>> appears
>>>>> more
>>>>>> like a hint from the syntactic sugar point of view.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
>>> could
>>>>>>> have a switch. Given that SQL Server and Postgres support it seems
>>>>>>> reasonable to support the table functions without the TABLE syntax.
>>>>>>> 
>>>>>>> I for one definitely think the TABLE syntax is much more confusing
>> to
>>>>> use,
>>>>>>> especially in the example that we're looking to support, such as:
>>>>>>> 
>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>>> =>
>>>>>>> '|', skipFirstRow => true)
>>>>>>> 
>>>>>>> This seems much clearer than:
>>>>>>> 
>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>> fieldDelimiter
>>>>>>> => '|', skipFirstRow => true))
>>>>>>> 
>>>>>>> It also looks much more like a hint to the table (which is our
>> goal).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jacques Nadeau
>>>>>>> CTO and Co-Founder, Dremio
>>>>>>> 
>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>> do.
>>>>> It is
>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>> standard
>>>>> an
>>>>>>>> make the TABLE keyword optional.
>>>>>>>> 
>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>> clash
>>>>> with
>>>>>>>> one of our own (few) extensions. In
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>> EXTENDS
>>>>>>>> clause. You can write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
>>> table
>>>>>>> but
>>>>>>>> you would like to use them in this particular query. We chose to
>>> make
>>>>> the
>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> That is uncomfortably close to
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>> 
>>>>>>>> so we would require
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>> 
>>>>>>>> if EmpFunction was a table-function. You could combine the two
>> forms
>>>>> like
>>>>>>>> this:
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>> 
>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>> should
>>>>>>> also
>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> - Table function syntax: I did a quick search and it seems
>> there's
>>> no
>>>>>>>>> consensus about this.
>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
>>> table
>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
>>> [4]
>>>>>>>>> expect it.
>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>> generation
>>> to
>>>>>>>> have
>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
>>> like
>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>> 
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>> [2]
>>>>>>> 
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>> [3]
>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>> [4]
>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>> [5]
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>> 
>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>> But that seems too easy to be true. Possibly this method is
>> called
>>>>> more
>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>> 
>>>>>>>>> FYI this would happen only when using named parameter. We do want
>>> to
>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>> 
>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>> 
>>>>>>>>> Julien
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>> Calcite
>>>>> when
>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes; see below.
>>>>>>>>>> 
>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
>>> just
>>>>>>>> being
>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
>>> the
>>>>>>> way
>>>>>>>> to
>>>>>>>>>> go.
>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>>>>>>> necessary?*
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Consider:
>>>>>>>>>> 
>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>> 
>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
>>> Also,
>>>>>>> if
>>>>>>>> f
>>>>>>>>>> is a function with zero arguments, could you invoke it like
>> this?:
>>>>>>>>>> 
>>>>>>>>>> select * from f
>>>>>>>>>> 
>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>> standards
>>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>>>>>>> 
>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
>>> in
>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>> get
>>> a
>>>>>>> NPE
>>>>>>>>>> later on.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
>>> JIRA
>>>>>>> case
>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>> arguments.
>>>>>>>>>> (It probably applies to all functions, not just table
>> functions.)
>>>>> The
>>>>>>>> fix
>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>> 
>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Julien
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Jim Scott*
>>>>>> Director, Enterprise Strategy & Architecture
>>>>>> +1 (347) 746-9281
>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>> 
>>>>>> <http://www.mapr.com/>
>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>> 
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>> 
>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Can you use both together? Say

  select columns
  from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’) EXTEND (foo INTEGER)

Julian



> On Nov 10, 2015, at 10:51 AM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> I took a stab at adding the TableFunction syntax without table(...) in
> Calcite.
> I have verified that both the table function and extend (with or without
> keyword) work
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> 
> These work:
> 
> select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
> '|')
> 
> select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter => '|'))
> 
> select columns from table(dfs.`/path/to/myfile`('JSON'))
> 
> select columns from dfs.`/path/to/myfile`('JSON')
> 
> select columns from dfs.`/path/to/myfile`(type => 'JSON')
> 
> On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
>> Drill does implicitly what Phoenix does explicitly so I don't think we
>> should constrain ourselves to having a union of the two syntaxes.
>> 
>> 
>> That being said, I think we could make these work together... maybe.
>> 
>> Remove the EXTENDS without keyword syntax from the grammar.
>> 
>> Create a new sub block in the table block that requires no keyword. There
>> would be two paths (and would probably require some lookahead)
>> 
>> option 1> unnamed parameters (1,2,3)
>> option 2> named parameters (a => 1, b=>2, c=> 3)
>> option 3> create table field pattern (favoriteBand VARCHAR(100),
>> golfHandicap INTEGER)
>> 
>> Then we create a table function with options 1 & 2, an EXTENDS clause for
>> option 3.
>> 
>> Best of both worlds?
>> 
>> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
>> wrote:
>> 
>>> Phoenix already supports columns at read-time using the syntax without
>> the
>>> EXTENDS keyword as Julian indicated:
>>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>   WHERE goldHandicap < 10;
>>> 
>>> Changing this by requiring the EXTENDS keyword would create a backward
>>> compatibility problem.
>>> 
>>> I think it'd be good if both of these extensions worked in Drill &
>> Phoenix
>>> given our Drillix initiative.
>>> 
>>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>> 
>>>> My proposal was an a or b using the freemarker template in the grammar,
>>>> not something later.
>>>> 
>>>> Actually, put another way: we may want to consider stating that we only
>>>> incorporate SQL standards in our primary grammar. Any extensions should
>>> be
>>>> optional grammar. We could simply have grammar plugins in Calcite (the
>>> same
>>>> way we plug in external things in Drill).
>>>> 
>>>> Trying to get every project to agree on extensions seems like it may be
>>>> hard.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> I can see why Jacques wants this syntax.
>>>>> 
>>>>> However a “switch" in a grammar is a bad idea. Grammars need to be
>>>>> predictable. Any variation should happen at validation time, or later.
>>>>> 
>>>>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>>>>> tough design discussion.
>>>>> 
>>>>> EXTENDS and eliding TABLE are both extensions to standard SQL, and
>> they
>>>>> are both applicable to Drill and Phoenix. I think Drill and Phoenix
>> (by
>>>>> which I mean Jacques and James, I guess) need to agree what the SQL
>>> syntax
>>>>> should be.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>>>>>> 
>>>>>> Looking at those two examples I agree with Jacques. The first
>> appears
>>>>> more
>>>>>> like a hint from the syntactic sugar point of view.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Since EXTEND is custom functionality, it seems reasonable that we
>>> could
>>>>>>> have a switch. Given that SQL Server and Postgres support it seems
>>>>>>> reasonable to support the table functions without the TABLE syntax.
>>>>>>> 
>>>>>>> I for one definitely think the TABLE syntax is much more confusing
>> to
>>>>> use,
>>>>>>> especially in the example that we're looking to support, such as:
>>>>>>> 
>>>>>>> select * from dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>>> =>
>>>>>>> '|', skipFirstRow => true)
>>>>>>> 
>>>>>>> This seems much clearer than:
>>>>>>> 
>>>>>>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>>>>> fieldDelimiter
>>>>>>> => '|', skipFirstRow => true))
>>>>>>> 
>>>>>>> It also looks much more like a hint to the table (which is our
>> goal).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jacques Nadeau
>>>>>>> CTO and Co-Founder, Dremio
>>>>>>> 
>>>>>>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> Thanks for doing the legwork and finding what the other vendors
>> do.
>>>>> It is
>>>>>>>> indeed compelling that SQL Server and Postgres go beyond the
>>> standard
>>>>> an
>>>>>>>> make the TABLE keyword optional.
>>>>>>>> 
>>>>>>>> I tried that syntax in Calcite and discovered that there is a
>> clash
>>>>> with
>>>>>>>> one of our own (few) extensions. In
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>>>>> EXTENDS
>>>>>>>> clause. You can write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> to tell Calcite that there are two undeclared columns in the Emp
>>> table
>>>>>>> but
>>>>>>>> you would like to use them in this particular query. We chose to
>>> make
>>>>> the
>>>>>>>> EXTEND keyword optional, so you could instead write
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>>>>>> WHERE goldHandicap < 10;
>>>>>>>> 
>>>>>>>> That is uncomfortably close to
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM EmpFunction (favoriteBand, golfHandicap);
>>>>>>>> 
>>>>>>>> so we would require
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>>>>>>> 
>>>>>>>> if EmpFunction was a table-function. You could combine the two
>> forms
>>>>> like
>>>>>>>> this:
>>>>>>>> 
>>>>>>>> SELECT *
>>>>>>>> FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>>>>>>> (anotherAttribute INTEGER);
>>>>>>>> 
>>>>>>>> We could revisit whether EXTEND is optional, I suppose. But we
>>> should
>>>>>>> also
>>>>>>>> ask whether requiring folks to type TABLE is such a hardship.
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> - Table function syntax: I did a quick search and it seems
>> there's
>>> no
>>>>>>>>> consensus about this.
>>>>>>>>> It seems that Posgres [1] and SQL Server [2] both allow calling
>>> table
>>>>>>>>> functions without the table(...) wrapper while Oracle [3] and DB2
>>> [4]
>>>>>>>>> expect it.
>>>>>>>>> MySQL does not have table functions [5]
>>>>>>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>>>>>>> Would it be reasonable to allow a switch in the grammar
>> generation
>>> to
>>>>>>>> have
>>>>>>>>> a posgres compatible syntax? Currently in Drill we use the MySQL
>>> like
>>>>>>>>> syntax (back ticks for identifiers etc)
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>> 
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>>>>>>> [2]
>>>>>>> 
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>>>>>>> [3]
>>> https://oracle-base.com/articles/misc/pipelined-table-functions
>>>>>>>>> [4]
>>> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>>>>>>> [5]
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>>>>>>> 
>>>>>>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>>>>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>>>>>>> But that seems too easy to be true. Possibly this method is
>> called
>>>>> more
>>>>>>>>> than once (before and after the function has been resolved?)
>>>>>>>>> 
>>>>>>>>> FYI this would happen only when using named parameter. We do want
>>> to
>>>>>>>>> overload in this case, which is why I'm looking into it.
>>>>>>>>> 
>>>>>>>>> I'll fill a JIRA for my other branch
>>>>>>>>> 
>>>>>>>>> Julien
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> TL;DR: TableMacro works for me; I need help with a bug in
>> Calcite
>>>>> when
>>>>>>>>>> there's more than 1 function with the same name.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes; see below.
>>>>>>>>>> 
>>>>>>>>>> FYI: I have a prototype of TableMacro working in Drill. For now
>>> just
>>>>>>>> being
>>>>>>>>>> able to specify the delimiter for csv files.
>>>>>>>>>> So it seem the answer to my question 1) is that TableMacros are
>>> the
>>>>>>> way
>>>>>>>> to
>>>>>>>>>> go.
>>>>>>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>>>>>>> necessary?*
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Consider:
>>>>>>>>>> 
>>>>>>>>>> select * from myTable as f(x, y)
>>>>>>>>>> select * from myTable f(x, y)
>>>>>>>>>> select * from myFunction(x, y)
>>>>>>>>>> 
>>>>>>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
>>> Also,
>>>>>>> if
>>>>>>>> f
>>>>>>>>>> is a function with zero arguments, could you invoke it like
>> this?:
>>>>>>>>>> 
>>>>>>>>>> select * from f
>>>>>>>>>> 
>>>>>>>>>> I don’t know the actual rationale. But I know that the SQL
>>> standards
>>>>>>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>>>>>>> 
>>>>>>>>>> I had to fix some things in Calcite to enable this:
>>>>>>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>>>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
>>> in
>>>>>>>>>> Calcite for the Maze example.
>>>>>>>>>> Which is why some hooks were missing.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Can you log a jira case to track this bug?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>>>>>>> Here is a test that reproduces the problem:
>>>>>>>>>> https://github.com/apache/calcite/pull/166
>>>>>>>>>> If we return more than 1 TableFunction with the same name, we
>> get
>>> a
>>>>>>> NPE
>>>>>>>>>> later on.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, I knew there was a problem with overloading. Please log a
>>> JIRA
>>>>>>> case
>>>>>>>>>> on resolution of overloaded functions when invoked with named
>>>>>>> arguments.
>>>>>>>>>> (It probably applies to all functions, not just table
>> functions.)
>>>>> The
>>>>>>>> fix
>>>>>>>>>> will take a while (if you wait for me to write it).
>>>>>>>>>> 
>>>>>>>>>> For now please tell your users not to overload. :)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Julien
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Jim Scott*
>>>>>> Director, Enterprise Strategy & Architecture
>>>>>> +1 (347) 746-9281
>>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>>> 
>>>>>> <http://www.mapr.com/>
>>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>>> 
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>> 
>>> 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

I took a stab at adding the TableFunction syntax without table(...) in
Calcite.
I have verified that both the table function and extend (with or without
keyword) work
https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34

These work:

select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
'|')

select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
fieldDelimiter => '|'))

select columns from table(dfs.`/path/to/myfile`('JSON'))

select columns from dfs.`/path/to/myfile`('JSON')

select columns from dfs.`/path/to/myfile`(type => 'JSON')

On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:

> Drill does implicitly what Phoenix does explicitly so I don't think we
> should constrain ourselves to having a union of the two syntaxes.
>
>
> That being said, I think we could make these work together... maybe.
>
> Remove the EXTENDS without keyword syntax from the grammar.
>
> Create a new sub block in the table block that requires no keyword. There
> would be two paths (and would probably require some lookahead)
>
> option 1> unnamed parameters (1,2,3)
> option 2> named parameters (a => 1, b=>2, c=> 3)
> option 3> create table field pattern (favoriteBand VARCHAR(100),
> golfHandicap INTEGER)
>
> Then we create a table function with options 1 & 2, an EXTENDS clause for
> option 3.
>
> Best of both worlds?
>
> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> wrote:
>
> > Phoenix already supports columns at read-time using the syntax without
> the
> > EXTENDS keyword as Julian indicated:
> >    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >    WHERE goldHandicap < 10;
> >
> > Changing this by requiring the EXTENDS keyword would create a backward
> > compatibility problem.
> >
> > I think it'd be good if both of these extensions worked in Drill &
> Phoenix
> > given our Drillix initiative.
> >
> > On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >
> > > My proposal was an a or b using the freemarker template in the grammar,
> > > not something later.
> > >
> > > Actually, put another way: we may want to consider stating that we only
> > > incorporate SQL standards in our primary grammar. Any extensions should
> > be
> > > optional grammar. We could simply have grammar plugins in Calcite (the
> > same
> > > way we plug in external things in Drill).
> > >
> > > Trying to get every project to agree on extensions seems like it may be
> > > hard.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >> I can see why Jacques wants this syntax.
> > >>
> > >> However a “switch" in a grammar is a bad idea. Grammars need to be
> > >> predictable. Any variation should happen at validation time, or later.
> > >>
> > >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> > >> tough design discussion.
> > >>
> > >> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> they
> > >> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> (by
> > >> which I mean Jacques and James, I guess) need to agree what the SQL
> > syntax
> > >> should be.
> > >>
> > >> Julian
> > >>
> > >>
> > >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> > >> >
> > >> > Looking at those two examples I agree with Jacques. The first
> appears
> > >> more
> > >> > like a hint from the syntactic sugar point of view.
> > >> >
> > >> >
> > >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >
> > >> wrote:
> > >> >
> > >> >> Since EXTEND is custom functionality, it seems reasonable that we
> > could
> > >> >> have a switch. Given that SQL Server and Postgres support it seems
> > >> >> reasonable to support the table functions without the TABLE syntax.
> > >> >>
> > >> >> I for one definitely think the TABLE syntax is much more confusing
> to
> > >> use,
> > >> >> especially in the example that we're looking to support, such as:
> > >> >>
> > >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> fieldDelimiter
> > =>
> > >> >> '|', skipFirstRow => true)
> > >> >>
> > >> >> This seems much clearer than:
> > >> >>
> > >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> > >> fieldDelimiter
> > >> >> => '|', skipFirstRow => true))
> > >> >>
> > >> >> It also looks much more like a hint to the table (which is our
> goal).
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Jacques Nadeau
> > >> >> CTO and Co-Founder, Dremio
> > >> >>
> > >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> > wrote:
> > >> >>
> > >> >>> Thanks for doing the legwork and finding what the other vendors
> do.
> > >> It is
> > >> >>> indeed compelling that SQL Server and Postgres go beyond the
> > standard
> > >> an
> > >> >>> make the TABLE keyword optional.
> > >> >>>
> > >> >>> I tried that syntax in Calcite and discovered that there is a
> clash
> > >> with
> > >> >>> one of our own (few) extensions. In
> > >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> > >> EXTENDS
> > >> >>> clause. You can write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> to tell Calcite that there are two undeclared columns in the Emp
> > table
> > >> >> but
> > >> >>> you would like to use them in this particular query. We chose to
> > make
> > >> the
> > >> >>> EXTEND keyword optional, so you could instead write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> That is uncomfortably close to
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> > >> >>>
> > >> >>> so we would require
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> > >> >>>
> > >> >>> if EmpFunction was a table-function. You could combine the two
> forms
> > >> like
> > >> >>> this:
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > >> >>> (anotherAttribute INTEGER);
> > >> >>>
> > >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> > should
> > >> >> also
> > >> >>> ask whether requiring folks to type TABLE is such a hardship.
> > >> >>>
> > >> >>> Julian
> > >> >>>
> > >> >>>
> > >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> > wrote:
> > >> >>>>
> > >> >>>> - Table function syntax: I did a quick search and it seems
> there's
> > no
> > >> >>>> consensus about this.
> > >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> > table
> > >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> > [4]
> > >> >>>> expect it.
> > >> >>>> MySQL does not have table functions [5]
> > >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> > >> >>>> Would it be reasonable to allow a switch in the grammar
> generation
> > to
> > >> >>> have
> > >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> > like
> > >> >>>> syntax (back ticks for identifiers etc)
> > >> >>>>
> > >> >>>> [1]
> > >> >>
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > >> >>>> [2]
> > >> >>
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > >> >>>> [3]
> > https://oracle-base.com/articles/misc/pipelined-table-functions
> > >> >>>> [4]
> > http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > >> >>>> [5]
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >> >>>>
> > >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> > >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> > >> >>>> But that seems too easy to be true. Possibly this method is
> called
> > >> more
> > >> >>>> than once (before and after the function has been resolved?)
> > >> >>>>
> > >> >>>> FYI this would happen only when using named parameter. We do want
> > to
> > >> >>>> overload in this case, which is why I'm looking into it.
> > >> >>>>
> > >> >>>> I'll fill a JIRA for my other branch
> > >> >>>>
> > >> >>>> Julien
> > >> >>>>
> > >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> > >> wrote:
> > >> >>>>
> > >> >>>>>
> > >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> > >> wrote:
> > >> >>>>>
> > >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in
> Calcite
> > >> when
> > >> >>>>> there's more than 1 function with the same name.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes; see below.
> > >> >>>>>
> > >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> > just
> > >> >>> being
> > >> >>>>> able to specify the delimiter for csv files.
> > >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> > the
> > >> >> way
> > >> >>> to
> > >> >>>>> go.
> > >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> >>>>> necessary?*
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Consider:
> > >> >>>>>
> > >> >>>>> select * from myTable as f(x, y)
> > >> >>>>> select * from myTable f(x, y)
> > >> >>>>> select * from myFunction(x, y)
> > >> >>>>>
> > >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> > Also,
> > >> >> if
> > >> >>> f
> > >> >>>>> is a function with zero arguments, could you invoke it like
> this?:
> > >> >>>>>
> > >> >>>>> select * from f
> > >> >>>>>
> > >> >>>>> I don’t know the actual rationale. But I know that the SQL
> > standards
> > >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> > >> >>>>>
> > >> >>>>> I had to fix some things in Calcite to enable this:
> > >> >>>>> https://github.com/dremio/calcite/pull/1/files
> > >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> > in
> > >> >>>>> Calcite for the Maze example.
> > >> >>>>> Which is why some hooks were missing.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Can you log a jira case to track this bug?
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> > >> >>>>> Here is a test that reproduces the problem:
> > >> >>>>> https://github.com/apache/calcite/pull/166
> > >> >>>>> If we return more than 1 TableFunction with the same name, we
> get
> > a
> > >> >> NPE
> > >> >>>>> later on.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> > JIRA
> > >> >> case
> > >> >>>>> on resolution of overloaded functions when invoked with named
> > >> >> arguments.
> > >> >>>>> (It probably applies to all functions, not just table
> functions.)
> > >> The
> > >> >>> fix
> > >> >>>>> will take a while (if you wait for me to write it).
> > >> >>>>>
> > >> >>>>> For now please tell your users not to overload. :)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Julian
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> --
> > >> >>>> Julien
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > *Jim Scott*
> > >> > Director, Enterprise Strategy & Architecture
> > >> > +1 (347) 746-9281
> > >> > @kingmesal <https://twitter.com/kingmesal>
> > >> >
> > >> > <http://www.mapr.com/>
> > >> > [image: MapR Technologies] <http://www.mapr.com>
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >
> > >>
> > >>
> > >
> >
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

I took a stab at adding the TableFunction syntax without table(...) in
Calcite.
I have verified that both the table function and extend (with or without
keyword) work
https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34

These work:

select columns from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter =>
'|')

select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
fieldDelimiter => '|'))

select columns from table(dfs.`/path/to/myfile`('JSON'))

select columns from dfs.`/path/to/myfile`('JSON')

select columns from dfs.`/path/to/myfile`(type => 'JSON')

On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau <ja...@apache.org> wrote:

> Drill does implicitly what Phoenix does explicitly so I don't think we
> should constrain ourselves to having a union of the two syntaxes.
>
>
> That being said, I think we could make these work together... maybe.
>
> Remove the EXTENDS without keyword syntax from the grammar.
>
> Create a new sub block in the table block that requires no keyword. There
> would be two paths (and would probably require some lookahead)
>
> option 1> unnamed parameters (1,2,3)
> option 2> named parameters (a => 1, b=>2, c=> 3)
> option 3> create table field pattern (favoriteBand VARCHAR(100),
> golfHandicap INTEGER)
>
> Then we create a table function with options 1 & 2, an EXTENDS clause for
> option 3.
>
> Best of both worlds?
>
> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org>
> wrote:
>
> > Phoenix already supports columns at read-time using the syntax without
> the
> > EXTENDS keyword as Julian indicated:
> >    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >    WHERE goldHandicap < 10;
> >
> > Changing this by requiring the EXTENDS keyword would create a backward
> > compatibility problem.
> >
> > I think it'd be good if both of these extensions worked in Drill &
> Phoenix
> > given our Drillix initiative.
> >
> > On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >
> > > My proposal was an a or b using the freemarker template in the grammar,
> > > not something later.
> > >
> > > Actually, put another way: we may want to consider stating that we only
> > > incorporate SQL standards in our primary grammar. Any extensions should
> > be
> > > optional grammar. We could simply have grammar plugins in Calcite (the
> > same
> > > way we plug in external things in Drill).
> > >
> > > Trying to get every project to agree on extensions seems like it may be
> > > hard.
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >> I can see why Jacques wants this syntax.
> > >>
> > >> However a “switch" in a grammar is a bad idea. Grammars need to be
> > >> predictable. Any variation should happen at validation time, or later.
> > >>
> > >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> > >> tough design discussion.
> > >>
> > >> EXTENDS and eliding TABLE are both extensions to standard SQL, and
> they
> > >> are both applicable to Drill and Phoenix. I think Drill and Phoenix
> (by
> > >> which I mean Jacques and James, I guess) need to agree what the SQL
> > syntax
> > >> should be.
> > >>
> > >> Julian
> > >>
> > >>
> > >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> > >> >
> > >> > Looking at those two examples I agree with Jacques. The first
> appears
> > >> more
> > >> > like a hint from the syntactic sugar point of view.
> > >> >
> > >> >
> > >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <jacques@dremio.com
> >
> > >> wrote:
> > >> >
> > >> >> Since EXTEND is custom functionality, it seems reasonable that we
> > could
> > >> >> have a switch. Given that SQL Server and Postgres support it seems
> > >> >> reasonable to support the table functions without the TABLE syntax.
> > >> >>
> > >> >> I for one definitely think the TABLE syntax is much more confusing
> to
> > >> use,
> > >> >> especially in the example that we're looking to support, such as:
> > >> >>
> > >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV',
> fieldDelimiter
> > =>
> > >> >> '|', skipFirstRow => true)
> > >> >>
> > >> >> This seems much clearer than:
> > >> >>
> > >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> > >> fieldDelimiter
> > >> >> => '|', skipFirstRow => true))
> > >> >>
> > >> >> It also looks much more like a hint to the table (which is our
> goal).
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Jacques Nadeau
> > >> >> CTO and Co-Founder, Dremio
> > >> >>
> > >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> > wrote:
> > >> >>
> > >> >>> Thanks for doing the legwork and finding what the other vendors
> do.
> > >> It is
> > >> >>> indeed compelling that SQL Server and Postgres go beyond the
> > standard
> > >> an
> > >> >>> make the TABLE keyword optional.
> > >> >>>
> > >> >>> I tried that syntax in Calcite and discovered that there is a
> clash
> > >> with
> > >> >>> one of our own (few) extensions. In
> > >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> > >> EXTENDS
> > >> >>> clause. You can write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> to tell Calcite that there are two undeclared columns in the Emp
> > table
> > >> >> but
> > >> >>> you would like to use them in this particular query. We chose to
> > make
> > >> the
> > >> >>> EXTEND keyword optional, so you could instead write
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> > >> >>>  WHERE goldHandicap < 10;
> > >> >>>
> > >> >>> That is uncomfortably close to
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> > >> >>>
> > >> >>> so we would require
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> > >> >>>
> > >> >>> if EmpFunction was a table-function. You could combine the two
> forms
> > >> like
> > >> >>> this:
> > >> >>>
> > >> >>>  SELECT *
> > >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > >> >>> (anotherAttribute INTEGER);
> > >> >>>
> > >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> > should
> > >> >> also
> > >> >>> ask whether requiring folks to type TABLE is such a hardship.
> > >> >>>
> > >> >>> Julian
> > >> >>>
> > >> >>>
> > >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> > wrote:
> > >> >>>>
> > >> >>>> - Table function syntax: I did a quick search and it seems
> there's
> > no
> > >> >>>> consensus about this.
> > >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> > table
> > >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> > [4]
> > >> >>>> expect it.
> > >> >>>> MySQL does not have table functions [5]
> > >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> > >> >>>> Would it be reasonable to allow a switch in the grammar
> generation
> > to
> > >> >>> have
> > >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> > like
> > >> >>>> syntax (back ticks for identifiers etc)
> > >> >>>>
> > >> >>>> [1]
> > >> >>
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > >> >>>> [2]
> > >> >>
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > >> >>>> [3]
> > https://oracle-base.com/articles/misc/pipelined-table-functions
> > >> >>>> [4]
> > http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > >> >>>> [5]
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >> >>>>
> > >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> > >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> > >> >>>> But that seems too easy to be true. Possibly this method is
> called
> > >> more
> > >> >>>> than once (before and after the function has been resolved?)
> > >> >>>>
> > >> >>>> FYI this would happen only when using named parameter. We do want
> > to
> > >> >>>> overload in this case, which is why I'm looking into it.
> > >> >>>>
> > >> >>>> I'll fill a JIRA for my other branch
> > >> >>>>
> > >> >>>> Julien
> > >> >>>>
> > >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> > >> wrote:
> > >> >>>>
> > >> >>>>>
> > >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> > >> wrote:
> > >> >>>>>
> > >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in
> Calcite
> > >> when
> > >> >>>>> there's more than 1 function with the same name.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes; see below.
> > >> >>>>>
> > >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> > just
> > >> >>> being
> > >> >>>>> able to specify the delimiter for csv files.
> > >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> > the
> > >> >> way
> > >> >>> to
> > >> >>>>> go.
> > >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> >>>>> necessary?*
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Consider:
> > >> >>>>>
> > >> >>>>> select * from myTable as f(x, y)
> > >> >>>>> select * from myTable f(x, y)
> > >> >>>>> select * from myFunction(x, y)
> > >> >>>>>
> > >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> > Also,
> > >> >> if
> > >> >>> f
> > >> >>>>> is a function with zero arguments, could you invoke it like
> this?:
> > >> >>>>>
> > >> >>>>> select * from f
> > >> >>>>>
> > >> >>>>> I don’t know the actual rationale. But I know that the SQL
> > standards
> > >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> > >> >>>>>
> > >> >>>>> I had to fix some things in Calcite to enable this:
> > >> >>>>> https://github.com/dremio/calcite/pull/1/files
> > >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> > in
> > >> >>>>> Calcite for the Maze example.
> > >> >>>>> Which is why some hooks were missing.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Can you log a jira case to track this bug?
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> > >> >>>>> Here is a test that reproduces the problem:
> > >> >>>>> https://github.com/apache/calcite/pull/166
> > >> >>>>> If we return more than 1 TableFunction with the same name, we
> get
> > a
> > >> >> NPE
> > >> >>>>> later on.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> > JIRA
> > >> >> case
> > >> >>>>> on resolution of overloaded functions when invoked with named
> > >> >> arguments.
> > >> >>>>> (It probably applies to all functions, not just table
> functions.)
> > >> The
> > >> >>> fix
> > >> >>>>> will take a while (if you wait for me to write it).
> > >> >>>>>
> > >> >>>>> For now please tell your users not to overload. :)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Julian
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> --
> > >> >>>> Julien
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > *Jim Scott*
> > >> > Director, Enterprise Strategy & Architecture
> > >> > +1 (347) 746-9281
> > >> > @kingmesal <https://twitter.com/kingmesal>
> > >> >
> > >> > <http://www.mapr.com/>
> > >> > [image: MapR Technologies] <http://www.mapr.com>
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >
> > >>
> > >>
> > >
> >
>



-- 
Julien

Re: select from table with options

Posted by Jacques Nadeau <ja...@apache.org>.

Drill does implicitly what Phoenix does explicitly so I don't think we
should constrain ourselves to having a union of the two syntaxes.


That being said, I think we could make these work together... maybe.

Remove the EXTENDS without keyword syntax from the grammar.

Create a new sub block in the table block that requires no keyword. There
would be two paths (and would probably require some lookahead)

option 1> unnamed parameters (1,2,3)
option 2> named parameters (a => 1, b=>2, c=> 3)
option 3> create table field pattern (favoriteBand VARCHAR(100),
golfHandicap INTEGER)

Then we create a table function with options 1 & 2, an EXTENDS clause for
option 3.

Best of both worlds?

On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org> wrote:

> Phoenix already supports columns at read-time using the syntax without the
> EXTENDS keyword as Julian indicated:
>    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>    WHERE goldHandicap < 10;
>
> Changing this by requiring the EXTENDS keyword would create a backward
> compatibility problem.
>
> I think it'd be good if both of these extensions worked in Drill & Phoenix
> given our Drillix initiative.
>
> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:
>
> > My proposal was an a or b using the freemarker template in the grammar,
> > not something later.
> >
> > Actually, put another way: we may want to consider stating that we only
> > incorporate SQL standards in our primary grammar. Any extensions should
> be
> > optional grammar. We could simply have grammar plugins in Calcite (the
> same
> > way we plug in external things in Drill).
> >
> > Trying to get every project to agree on extensions seems like it may be
> > hard.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> I can see why Jacques wants this syntax.
> >>
> >> However a “switch" in a grammar is a bad idea. Grammars need to be
> >> predictable. Any variation should happen at validation time, or later.
> >>
> >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> >> tough design discussion.
> >>
> >> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
> >> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
> >> which I mean Jacques and James, I guess) need to agree what the SQL
> syntax
> >> should be.
> >>
> >> Julian
> >>
> >>
> >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >> >
> >> > Looking at those two examples I agree with Jacques. The first appears
> >> more
> >> > like a hint from the syntactic sugar point of view.
> >> >
> >> >
> >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >> >
> >> >> Since EXTEND is custom functionality, it seems reasonable that we
> could
> >> >> have a switch. Given that SQL Server and Postgres support it seems
> >> >> reasonable to support the table functions without the TABLE syntax.
> >> >>
> >> >> I for one definitely think the TABLE syntax is much more confusing to
> >> use,
> >> >> especially in the example that we're looking to support, such as:
> >> >>
> >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> =>
> >> >> '|', skipFirstRow => true)
> >> >>
> >> >> This seems much clearer than:
> >> >>
> >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >> >> => '|', skipFirstRow => true))
> >> >>
> >> >> It also looks much more like a hint to the table (which is our goal).
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jacques Nadeau
> >> >> CTO and Co-Founder, Dremio
> >> >>
> >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >> >>
> >> >>> Thanks for doing the legwork and finding what the other vendors do.
> >> It is
> >> >>> indeed compelling that SQL Server and Postgres go beyond the
> standard
> >> an
> >> >>> make the TABLE keyword optional.
> >> >>>
> >> >>> I tried that syntax in Calcite and discovered that there is a clash
> >> with
> >> >>> one of our own (few) extensions. In
> >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >> EXTENDS
> >> >>> clause. You can write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> to tell Calcite that there are two undeclared columns in the Emp
> table
> >> >> but
> >> >>> you would like to use them in this particular query. We chose to
> make
> >> the
> >> >>> EXTEND keyword optional, so you could instead write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> That is uncomfortably close to
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> >> >>>
> >> >>> so we would require
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >> >>>
> >> >>> if EmpFunction was a table-function. You could combine the two forms
> >> like
> >> >>> this:
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >> >>> (anotherAttribute INTEGER);
> >> >>>
> >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> should
> >> >> also
> >> >>> ask whether requiring folks to type TABLE is such a hardship.
> >> >>>
> >> >>> Julian
> >> >>>
> >> >>>
> >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >> >>>>
> >> >>>> - Table function syntax: I did a quick search and it seems there's
> no
> >> >>>> consensus about this.
> >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> table
> >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> [4]
> >> >>>> expect it.
> >> >>>> MySQL does not have table functions [5]
> >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >> >>>> Would it be reasonable to allow a switch in the grammar generation
> to
> >> >>> have
> >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> like
> >> >>>> syntax (back ticks for identifiers etc)
> >> >>>>
> >> >>>> [1]
> >> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >> >>>> [2]
> >> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >> >>>> [3]
> https://oracle-base.com/articles/misc/pipelined-table-functions
> >> >>>> [4]
> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >> >>>> [5]
> >> >>>>
> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >> >>>>
> >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> >> >>>> But that seems too easy to be true. Possibly this method is called
> >> more
> >> >>>> than once (before and after the function has been resolved?)
> >> >>>>
> >> >>>> FYI this would happen only when using named parameter. We do want
> to
> >> >>>> overload in this case, which is why I'm looking into it.
> >> >>>>
> >> >>>> I'll fill a JIRA for my other branch
> >> >>>>
> >> >>>> Julien
> >> >>>>
> >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >> >>>>
> >> >>>>>
> >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >> >>>>>
> >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
> >> when
> >> >>>>> there's more than 1 function with the same name.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes; see below.
> >> >>>>>
> >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> just
> >> >>> being
> >> >>>>> able to specify the delimiter for csv files.
> >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> the
> >> >> way
> >> >>> to
> >> >>>>> go.
> >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >> >>>>> necessary?*
> >> >>>>>
> >> >>>>>
> >> >>>>> Consider:
> >> >>>>>
> >> >>>>> select * from myTable as f(x, y)
> >> >>>>> select * from myTable f(x, y)
> >> >>>>> select * from myFunction(x, y)
> >> >>>>>
> >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> Also,
> >> >> if
> >> >>> f
> >> >>>>> is a function with zero arguments, could you invoke it like this?:
> >> >>>>>
> >> >>>>> select * from f
> >> >>>>>
> >> >>>>> I don’t know the actual rationale. But I know that the SQL
> standards
> >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> >> >>>>>
> >> >>>>> I had to fix some things in Calcite to enable this:
> >> >>>>> https://github.com/dremio/calcite/pull/1/files
> >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> in
> >> >>>>> Calcite for the Maze example.
> >> >>>>> Which is why some hooks were missing.
> >> >>>>>
> >> >>>>>
> >> >>>>> Can you log a jira case to track this bug?
> >> >>>>>
> >> >>>>>
> >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >> >>>>> Here is a test that reproduces the problem:
> >> >>>>> https://github.com/apache/calcite/pull/166
> >> >>>>> If we return more than 1 TableFunction with the same name, we get
> a
> >> >> NPE
> >> >>>>> later on.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> JIRA
> >> >> case
> >> >>>>> on resolution of overloaded functions when invoked with named
> >> >> arguments.
> >> >>>>> (It probably applies to all functions, not just table functions.)
> >> The
> >> >>> fix
> >> >>>>> will take a while (if you wait for me to write it).
> >> >>>>>
> >> >>>>> For now please tell your users not to overload. :)
> >> >>>>>
> >> >>>>>
> >> >>>>> Julian
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Julien
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > *Jim Scott*
> >> > Director, Enterprise Strategy & Architecture
> >> > +1 (347) 746-9281
> >> > @kingmesal <https://twitter.com/kingmesal>
> >> >
> >> > <http://www.mapr.com/>
> >> > [image: MapR Technologies] <http://www.mapr.com>
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >
> >>
> >>
> >
>

Re: select from table with options

Posted by Jacques Nadeau <ja...@apache.org>.

Drill does implicitly what Phoenix does explicitly so I don't think we
should constrain ourselves to having a union of the two syntaxes.


That being said, I think we could make these work together... maybe.

Remove the EXTENDS without keyword syntax from the grammar.

Create a new sub block in the table block that requires no keyword. There
would be two paths (and would probably require some lookahead)

option 1> unnamed parameters (1,2,3)
option 2> named parameters (a => 1, b=>2, c=> 3)
option 3> create table field pattern (favoriteBand VARCHAR(100),
golfHandicap INTEGER)

Then we create a table function with options 1 & 2, an EXTENDS clause for
option 3.

Best of both worlds?

On Sat, Nov 7, 2015 at 4:44 PM, James Taylor <ja...@apache.org> wrote:

> Phoenix already supports columns at read-time using the syntax without the
> EXTENDS keyword as Julian indicated:
>    SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>    WHERE goldHandicap < 10;
>
> Changing this by requiring the EXTENDS keyword would create a backward
> compatibility problem.
>
> I think it'd be good if both of these extensions worked in Drill & Phoenix
> given our Drillix initiative.
>
> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:
>
> > My proposal was an a or b using the freemarker template in the grammar,
> > not something later.
> >
> > Actually, put another way: we may want to consider stating that we only
> > incorporate SQL standards in our primary grammar. Any extensions should
> be
> > optional grammar. We could simply have grammar plugins in Calcite (the
> same
> > way we plug in external things in Drill).
> >
> > Trying to get every project to agree on extensions seems like it may be
> > hard.
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> I can see why Jacques wants this syntax.
> >>
> >> However a “switch" in a grammar is a bad idea. Grammars need to be
> >> predictable. Any variation should happen at validation time, or later.
> >>
> >> Also, we shouldn’t add configuration parameters as a way of avoiding a
> >> tough design discussion.
> >>
> >> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
> >> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
> >> which I mean Jacques and James, I guess) need to agree what the SQL
> syntax
> >> should be.
> >>
> >> Julian
> >>
> >>
> >> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >> >
> >> > Looking at those two examples I agree with Jacques. The first appears
> >> more
> >> > like a hint from the syntactic sugar point of view.
> >> >
> >> >
> >> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
> >> wrote:
> >> >
> >> >> Since EXTEND is custom functionality, it seems reasonable that we
> could
> >> >> have a switch. Given that SQL Server and Postgres support it seems
> >> >> reasonable to support the table functions without the TABLE syntax.
> >> >>
> >> >> I for one definitely think the TABLE syntax is much more confusing to
> >> use,
> >> >> especially in the example that we're looking to support, such as:
> >> >>
> >> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> =>
> >> >> '|', skipFirstRow => true)
> >> >>
> >> >> This seems much clearer than:
> >> >>
> >> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> >> fieldDelimiter
> >> >> => '|', skipFirstRow => true))
> >> >>
> >> >> It also looks much more like a hint to the table (which is our goal).
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jacques Nadeau
> >> >> CTO and Co-Founder, Dremio
> >> >>
> >> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >> >>
> >> >>> Thanks for doing the legwork and finding what the other vendors do.
> >> It is
> >> >>> indeed compelling that SQL Server and Postgres go beyond the
> standard
> >> an
> >> >>> make the TABLE keyword optional.
> >> >>>
> >> >>> I tried that syntax in Calcite and discovered that there is a clash
> >> with
> >> >>> one of our own (few) extensions. In
> >> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
> >> EXTENDS
> >> >>> clause. You can write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> to tell Calcite that there are two undeclared columns in the Emp
> table
> >> >> but
> >> >>> you would like to use them in this particular query. We chose to
> make
> >> the
> >> >>> EXTEND keyword optional, so you could instead write
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >> >>>  WHERE goldHandicap < 10;
> >> >>>
> >> >>> That is uncomfortably close to
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> >> >>>
> >> >>> so we would require
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >> >>>
> >> >>> if EmpFunction was a table-function. You could combine the two forms
> >> like
> >> >>> this:
> >> >>>
> >> >>>  SELECT *
> >> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >> >>> (anotherAttribute INTEGER);
> >> >>>
> >> >>> We could revisit whether EXTEND is optional, I suppose. But we
> should
> >> >> also
> >> >>> ask whether requiring folks to type TABLE is such a hardship.
> >> >>>
> >> >>> Julian
> >> >>>
> >> >>>
> >> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >> >>>>
> >> >>>> - Table function syntax: I did a quick search and it seems there's
> no
> >> >>>> consensus about this.
> >> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling
> table
> >> >>>> functions without the table(...) wrapper while Oracle [3] and DB2
> [4]
> >> >>>> expect it.
> >> >>>> MySQL does not have table functions [5]
> >> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >> >>>> Would it be reasonable to allow a switch in the grammar generation
> to
> >> >>> have
> >> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL
> like
> >> >>>> syntax (back ticks for identifiers etc)
> >> >>>>
> >> >>>> [1]
> >> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >> >>>> [2]
> >> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >> >>>> [3]
> https://oracle-base.com/articles/misc/pipelined-table-functions
> >> >>>> [4]
> http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >> >>>> [5]
> >> >>>>
> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >> >>>>
> >> >>>> - It seems a simple change in SqlCallBinding fixes the function
> >> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> >> >>>> But that seems too easy to be true. Possibly this method is called
> >> more
> >> >>>> than once (before and after the function has been resolved?)
> >> >>>>
> >> >>>> FYI this would happen only when using named parameter. We do want
> to
> >> >>>> overload in this case, which is why I'm looking into it.
> >> >>>>
> >> >>>> I'll fill a JIRA for my other branch
> >> >>>>
> >> >>>> Julien
> >> >>>>
> >> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >> >>>>
> >> >>>>>
> >> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
> >> wrote:
> >> >>>>>
> >> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
> >> when
> >> >>>>> there's more than 1 function with the same name.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes; see below.
> >> >>>>>
> >> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now
> just
> >> >>> being
> >> >>>>> able to specify the delimiter for csv files.
> >> >>>>> So it seem the answer to my question 1) is that TableMacros are
> the
> >> >> way
> >> >>> to
> >> >>>>> go.
> >> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >> >>>>> necessary?*
> >> >>>>>
> >> >>>>>
> >> >>>>> Consider:
> >> >>>>>
> >> >>>>> select * from myTable as f(x, y)
> >> >>>>> select * from myTable f(x, y)
> >> >>>>> select * from myFunction(x, y)
> >> >>>>>
> >> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar.
> Also,
> >> >> if
> >> >>> f
> >> >>>>> is a function with zero arguments, could you invoke it like this?:
> >> >>>>>
> >> >>>>> select * from f
> >> >>>>>
> >> >>>>> I don’t know the actual rationale. But I know that the SQL
> standards
> >> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> >> >>>>>
> >> >>>>> I had to fix some things in Calcite to enable this:
> >> >>>>> https://github.com/dremio/calcite/pull/1/files
> >> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used
> in
> >> >>>>> Calcite for the Maze example.
> >> >>>>> Which is why some hooks were missing.
> >> >>>>>
> >> >>>>>
> >> >>>>> Can you log a jira case to track this bug?
> >> >>>>>
> >> >>>>>
> >> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >> >>>>> Here is a test that reproduces the problem:
> >> >>>>> https://github.com/apache/calcite/pull/166
> >> >>>>> If we return more than 1 TableFunction with the same name, we get
> a
> >> >> NPE
> >> >>>>> later on.
> >> >>>>>
> >> >>>>>
> >> >>>>> Yes, I knew there was a problem with overloading. Please log a
> JIRA
> >> >> case
> >> >>>>> on resolution of overloaded functions when invoked with named
> >> >> arguments.
> >> >>>>> (It probably applies to all functions, not just table functions.)
> >> The
> >> >>> fix
> >> >>>>> will take a while (if you wait for me to write it).
> >> >>>>>
> >> >>>>> For now please tell your users not to overload. :)
> >> >>>>>
> >> >>>>>
> >> >>>>> Julian
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Julien
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > *Jim Scott*
> >> > Director, Enterprise Strategy & Architecture
> >> > +1 (347) 746-9281
> >> > @kingmesal <https://twitter.com/kingmesal>
> >> >
> >> > <http://www.mapr.com/>
> >> > [image: MapR Technologies] <http://www.mapr.com>
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >
> >>
> >>
> >
>

Re: select from table with options

Posted by James Taylor <ja...@apache.org>.

Phoenix already supports columns at read-time using the syntax without the
EXTENDS keyword as Julian indicated:
   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
   WHERE goldHandicap < 10;

Changing this by requiring the EXTENDS keyword would create a backward
compatibility problem.

I think it'd be good if both of these extensions worked in Drill & Phoenix
given our Drillix initiative.

On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> My proposal was an a or b using the freemarker template in the grammar,
> not something later.
>
> Actually, put another way: we may want to consider stating that we only
> incorporate SQL standards in our primary grammar. Any extensions should be
> optional grammar. We could simply have grammar plugins in Calcite (the same
> way we plug in external things in Drill).
>
> Trying to get every project to agree on extensions seems like it may be
> hard.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> I can see why Jacques wants this syntax.
>>
>> However a “switch" in a grammar is a bad idea. Grammars need to be
>> predictable. Any variation should happen at validation time, or later.
>>
>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>> tough design discussion.
>>
>> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
>> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
>> which I mean Jacques and James, I guess) need to agree what the SQL syntax
>> should be.
>>
>> Julian
>>
>>
>> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>> >
>> > Looking at those two examples I agree with Jacques. The first appears
>> more
>> > like a hint from the syntactic sugar point of view.
>> >
>> >
>> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> >
>> >> Since EXTEND is custom functionality, it seems reasonable that we could
>> >> have a switch. Given that SQL Server and Postgres support it seems
>> >> reasonable to support the table functions without the TABLE syntax.
>> >>
>> >> I for one definitely think the TABLE syntax is much more confusing to
>> use,
>> >> especially in the example that we're looking to support, such as:
>> >>
>> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> >> '|', skipFirstRow => true)
>> >>
>> >> This seems much clearer than:
>> >>
>> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>> >> => '|', skipFirstRow => true))
>> >>
>> >> It also looks much more like a hint to the table (which is our goal).
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jacques Nadeau
>> >> CTO and Co-Founder, Dremio
>> >>
>> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> >>
>> >>> Thanks for doing the legwork and finding what the other vendors do.
>> It is
>> >>> indeed compelling that SQL Server and Postgres go beyond the standard
>> an
>> >>> make the TABLE keyword optional.
>> >>>
>> >>> I tried that syntax in Calcite and discovered that there is a clash
>> with
>> >>> one of our own (few) extensions. In
>> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>> EXTENDS
>> >>> clause. You can write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> to tell Calcite that there are two undeclared columns in the Emp table
>> >> but
>> >>> you would like to use them in this particular query. We chose to make
>> the
>> >>> EXTEND keyword optional, so you could instead write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> That is uncomfortably close to
>> >>>
>> >>>  SELECT *
>> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>
>> >>> so we would require
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>
>> >>> if EmpFunction was a table-function. You could combine the two forms
>> like
>> >>> this:
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>> >>> (anotherAttribute INTEGER);
>> >>>
>> >>> We could revisit whether EXTEND is optional, I suppose. But we should
>> >> also
>> >>> ask whether requiring folks to type TABLE is such a hardship.
>> >>>
>> >>> Julian
>> >>>
>> >>>
>> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> >>>>
>> >>>> - Table function syntax: I did a quick search and it seems there's no
>> >>>> consensus about this.
>> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>> >>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>> >>>> expect it.
>> >>>> MySQL does not have table functions [5]
>> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
>> >>>> Would it be reasonable to allow a switch in the grammar generation to
>> >>> have
>> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>> >>>> syntax (back ticks for identifiers etc)
>> >>>>
>> >>>> [1]
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>> [2]
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>> [5]
>> >>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>
>> >>>> - It seems a simple change in SqlCallBinding fixes the function
>> >>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>> But that seems too easy to be true. Possibly this method is called
>> more
>> >>>> than once (before and after the function has been resolved?)
>> >>>>
>> >>>> FYI this would happen only when using named parameter. We do want to
>> >>>> overload in this case, which is why I'm looking into it.
>> >>>>
>> >>>> I'll fill a JIRA for my other branch
>> >>>>
>> >>>> Julien
>> >>>>
>> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >>>>
>> >>>>>
>> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>> >>>>>
>> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
>> when
>> >>>>> there's more than 1 function with the same name.
>> >>>>>
>> >>>>>
>> >>>>> Yes; see below.
>> >>>>>
>> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>> >>> being
>> >>>>> able to specify the delimiter for csv files.
>> >>>>> So it seem the answer to my question 1) is that TableMacros are the
>> >> way
>> >>> to
>> >>>>> go.
>> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>> >>>>> necessary?*
>> >>>>>
>> >>>>>
>> >>>>> Consider:
>> >>>>>
>> >>>>> select * from myTable as f(x, y)
>> >>>>> select * from myTable f(x, y)
>> >>>>> select * from myFunction(x, y)
>> >>>>>
>> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> >> if
>> >>> f
>> >>>>> is a function with zero arguments, could you invoke it like this?:
>> >>>>>
>> >>>>> select * from f
>> >>>>>
>> >>>>> I don’t know the actual rationale. But I know that the SQL standards
>> >>>>> people in their wisdom decided to add a keyword to disambiguate.
>> >>>>>
>> >>>>> I had to fix some things in Calcite to enable this:
>> >>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> >>>>> Calcite for the Maze example.
>> >>>>> Which is why some hooks were missing.
>> >>>>>
>> >>>>>
>> >>>>> Can you log a jira case to track this bug?
>> >>>>>
>> >>>>>
>> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
>> >>>>> Here is a test that reproduces the problem:
>> >>>>> https://github.com/apache/calcite/pull/166
>> >>>>> If we return more than 1 TableFunction with the same name, we get a
>> >> NPE
>> >>>>> later on.
>> >>>>>
>> >>>>>
>> >>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> >> case
>> >>>>> on resolution of overloaded functions when invoked with named
>> >> arguments.
>> >>>>> (It probably applies to all functions, not just table functions.)
>> The
>> >>> fix
>> >>>>> will take a while (if you wait for me to write it).
>> >>>>>
>> >>>>> For now please tell your users not to overload. :)
>> >>>>>
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Julien
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > *Jim Scott*
>> > Director, Enterprise Strategy & Architecture
>> > +1 (347) 746-9281
>> > @kingmesal <https://twitter.com/kingmesal>
>> >
>> > <http://www.mapr.com/>
>> > [image: MapR Technologies] <http://www.mapr.com>
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>>
>

Re: select from table with options

Posted by James Taylor <ja...@apache.org>.

Phoenix already supports columns at read-time using the syntax without the
EXTENDS keyword as Julian indicated:
   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
   WHERE goldHandicap < 10;

Changing this by requiring the EXTENDS keyword would create a backward
compatibility problem.

I think it'd be good if both of these extensions worked in Drill & Phoenix
given our Drillix initiative.

On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> My proposal was an a or b using the freemarker template in the grammar,
> not something later.
>
> Actually, put another way: we may want to consider stating that we only
> incorporate SQL standards in our primary grammar. Any extensions should be
> optional grammar. We could simply have grammar plugins in Calcite (the same
> way we plug in external things in Drill).
>
> Trying to get every project to agree on extensions seems like it may be
> hard.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> I can see why Jacques wants this syntax.
>>
>> However a “switch" in a grammar is a bad idea. Grammars need to be
>> predictable. Any variation should happen at validation time, or later.
>>
>> Also, we shouldn’t add configuration parameters as a way of avoiding a
>> tough design discussion.
>>
>> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
>> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
>> which I mean Jacques and James, I guess) need to agree what the SQL syntax
>> should be.
>>
>> Julian
>>
>>
>> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
>> >
>> > Looking at those two examples I agree with Jacques. The first appears
>> more
>> > like a hint from the syntactic sugar point of view.
>> >
>> >
>> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> >
>> >> Since EXTEND is custom functionality, it seems reasonable that we could
>> >> have a switch. Given that SQL Server and Postgres support it seems
>> >> reasonable to support the table functions without the TABLE syntax.
>> >>
>> >> I for one definitely think the TABLE syntax is much more confusing to
>> use,
>> >> especially in the example that we're looking to support, such as:
>> >>
>> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> >> '|', skipFirstRow => true)
>> >>
>> >> This seems much clearer than:
>> >>
>> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
>> fieldDelimiter
>> >> => '|', skipFirstRow => true))
>> >>
>> >> It also looks much more like a hint to the table (which is our goal).
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Jacques Nadeau
>> >> CTO and Co-Founder, Dremio
>> >>
>> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> >>
>> >>> Thanks for doing the legwork and finding what the other vendors do.
>> It is
>> >>> indeed compelling that SQL Server and Postgres go beyond the standard
>> an
>> >>> make the TABLE keyword optional.
>> >>>
>> >>> I tried that syntax in Calcite and discovered that there is a clash
>> with
>> >>> one of our own (few) extensions. In
>> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the
>> EXTENDS
>> >>> clause. You can write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> to tell Calcite that there are two undeclared columns in the Emp table
>> >> but
>> >>> you would like to use them in this particular query. We chose to make
>> the
>> >>> EXTEND keyword optional, so you could instead write
>> >>>
>> >>>  SELECT *
>> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>> >>>  WHERE goldHandicap < 10;
>> >>>
>> >>> That is uncomfortably close to
>> >>>
>> >>>  SELECT *
>> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
>> >>>
>> >>> so we would require
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>> >>>
>> >>> if EmpFunction was a table-function. You could combine the two forms
>> like
>> >>> this:
>> >>>
>> >>>  SELECT *
>> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>> >>> (anotherAttribute INTEGER);
>> >>>
>> >>> We could revisit whether EXTEND is optional, I suppose. But we should
>> >> also
>> >>> ask whether requiring folks to type TABLE is such a hardship.
>> >>>
>> >>> Julian
>> >>>
>> >>>
>> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> >>>>
>> >>>> - Table function syntax: I did a quick search and it seems there's no
>> >>>> consensus about this.
>> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>> >>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>> >>>> expect it.
>> >>>> MySQL does not have table functions [5]
>> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
>> >>>> Would it be reasonable to allow a switch in the grammar generation to
>> >>> have
>> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>> >>>> syntax (back ticks for identifiers etc)
>> >>>>
>> >>>> [1]
>> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>> >>>> [2]
>> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>> >>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>> >>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>> >>>> [5]
>> >>>>
>> >>>
>> >>
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>> >>>>
>> >>>> - It seems a simple change in SqlCallBinding fixes the function
>> >>>> overloading: https://github.com/apache/calcite/pull/166/files
>> >>>> But that seems too easy to be true. Possibly this method is called
>> more
>> >>>> than once (before and after the function has been resolved?)
>> >>>>
>> >>>> FYI this would happen only when using named parameter. We do want to
>> >>>> overload in this case, which is why I'm looking into it.
>> >>>>
>> >>>> I'll fill a JIRA for my other branch
>> >>>>
>> >>>> Julien
>> >>>>
>> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >>>>
>> >>>>>
>> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com>
>> wrote:
>> >>>>>
>> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
>> when
>> >>>>> there's more than 1 function with the same name.
>> >>>>>
>> >>>>>
>> >>>>> Yes; see below.
>> >>>>>
>> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>> >>> being
>> >>>>> able to specify the delimiter for csv files.
>> >>>>> So it seem the answer to my question 1) is that TableMacros are the
>> >> way
>> >>> to
>> >>>>> go.
>> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>> >>>>> necessary?*
>> >>>>>
>> >>>>>
>> >>>>> Consider:
>> >>>>>
>> >>>>> select * from myTable as f(x, y)
>> >>>>> select * from myTable f(x, y)
>> >>>>> select * from myFunction(x, y)
>> >>>>>
>> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> >> if
>> >>> f
>> >>>>> is a function with zero arguments, could you invoke it like this?:
>> >>>>>
>> >>>>> select * from f
>> >>>>>
>> >>>>> I don’t know the actual rationale. But I know that the SQL standards
>> >>>>> people in their wisdom decided to add a keyword to disambiguate.
>> >>>>>
>> >>>>> I had to fix some things in Calcite to enable this:
>> >>>>> https://github.com/dremio/calcite/pull/1/files
>> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> >>>>> Calcite for the Maze example.
>> >>>>> Which is why some hooks were missing.
>> >>>>>
>> >>>>>
>> >>>>> Can you log a jira case to track this bug?
>> >>>>>
>> >>>>>
>> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
>> >>>>> Here is a test that reproduces the problem:
>> >>>>> https://github.com/apache/calcite/pull/166
>> >>>>> If we return more than 1 TableFunction with the same name, we get a
>> >> NPE
>> >>>>> later on.
>> >>>>>
>> >>>>>
>> >>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> >> case
>> >>>>> on resolution of overloaded functions when invoked with named
>> >> arguments.
>> >>>>> (It probably applies to all functions, not just table functions.)
>> The
>> >>> fix
>> >>>>> will take a while (if you wait for me to write it).
>> >>>>>
>> >>>>> For now please tell your users not to overload. :)
>> >>>>>
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Julien
>> >>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > *Jim Scott*
>> > Director, Enterprise Strategy & Architecture
>> > +1 (347) 746-9281
>> > @kingmesal <https://twitter.com/kingmesal>
>> >
>> > <http://www.mapr.com/>
>> > [image: MapR Technologies] <http://www.mapr.com>
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>>
>

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

My proposal was an a or b using the freemarker template in the grammar, not
something later.

Actually, put another way: we may want to consider stating that we only
incorporate SQL standards in our primary grammar. Any extensions should be
optional grammar. We could simply have grammar plugins in Calcite (the same
way we plug in external things in Drill).

Trying to get every project to agree on extensions seems like it may be
hard.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde <jh...@apache.org> wrote:

> I can see why Jacques wants this syntax.
>
> However a “switch" in a grammar is a bad idea. Grammars need to be
> predictable. Any variation should happen at validation time, or later.
>
> Also, we shouldn’t add configuration parameters as a way of avoiding a
> tough design discussion.
>
> EXTENDS and eliding TABLE are both extensions to standard SQL, and they
> are both applicable to Drill and Phoenix. I think Drill and Phoenix (by
> which I mean Jacques and James, I guess) need to agree what the SQL syntax
> should be.
>
> Julian
>
>
> > On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> >
> > Looking at those two examples I agree with Jacques. The first appears
> more
> > like a hint from the syntactic sugar point of view.
> >
> >
> > On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >
> >> Since EXTEND is custom functionality, it seems reasonable that we could
> >> have a switch. Given that SQL Server and Postgres support it seems
> >> reasonable to support the table functions without the TABLE syntax.
> >>
> >> I for one definitely think the TABLE syntax is much more confusing to
> use,
> >> especially in the example that we're looking to support, such as:
> >>
> >> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
> >> '|', skipFirstRow => true)
> >>
> >> This seems much clearer than:
> >>
> >> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV',
> fieldDelimiter
> >> => '|', skipFirstRow => true))
> >>
> >> It also looks much more like a hint to the table (which is our goal).
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Jacques Nadeau
> >> CTO and Co-Founder, Dremio
> >>
> >> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
> >>
> >>> Thanks for doing the legwork and finding what the other vendors do. It
> is
> >>> indeed compelling that SQL Server and Postgres go beyond the standard
> an
> >>> make the TABLE keyword optional.
> >>>
> >>> I tried that syntax in Calcite and discovered that there is a clash
> with
> >>> one of our own (few) extensions. In
> >>> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> >>> clause. You can write
> >>>
> >>>  SELECT *
> >>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>  WHERE goldHandicap < 10;
> >>>
> >>> to tell Calcite that there are two undeclared columns in the Emp table
> >> but
> >>> you would like to use them in this particular query. We chose to make
> the
> >>> EXTEND keyword optional, so you could instead write
> >>>
> >>>  SELECT *
> >>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >>>  WHERE goldHandicap < 10;
> >>>
> >>> That is uncomfortably close to
> >>>
> >>>  SELECT *
> >>>  FROM EmpFunction (favoriteBand, golfHandicap);
> >>>
> >>> so we would require
> >>>
> >>>  SELECT *
> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >>>
> >>> if EmpFunction was a table-function. You could combine the two forms
> like
> >>> this:
> >>>
> >>>  SELECT *
> >>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> >>> (anotherAttribute INTEGER);
> >>>
> >>> We could revisit whether EXTEND is optional, I suppose. But we should
> >> also
> >>> ask whether requiring folks to type TABLE is such a hardship.
> >>>
> >>> Julian
> >>>
> >>>
> >>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>>>
> >>>> - Table function syntax: I did a quick search and it seems there's no
> >>>> consensus about this.
> >>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
> >>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> >>>> expect it.
> >>>> MySQL does not have table functions [5]
> >>>> 2 for, 2 against and 1 undecided: that's a draw :)
> >>>> Would it be reasonable to allow a switch in the grammar generation to
> >>> have
> >>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
> >>>> syntax (back ticks for identifiers etc)
> >>>>
> >>>> [1]
> >> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> >>>> [2]
> >> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> >>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> >>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> >>>> [5]
> >>>>
> >>>
> >>
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >>>>
> >>>> - It seems a simple change in SqlCallBinding fixes the function
> >>>> overloading: https://github.com/apache/calcite/pull/166/files
> >>>> But that seems too easy to be true. Possibly this method is called
> more
> >>>> than once (before and after the function has been resolved?)
> >>>>
> >>>> FYI this would happen only when using named parameter. We do want to
> >>>> overload in this case, which is why I'm looking into it.
> >>>>
> >>>> I'll fill a JIRA for my other branch
> >>>>
> >>>> Julien
> >>>>
> >>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>>
> >>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>>>>
> >>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite
> when
> >>>>> there's more than 1 function with the same name.
> >>>>>
> >>>>>
> >>>>> Yes; see below.
> >>>>>
> >>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
> >>> being
> >>>>> able to specify the delimiter for csv files.
> >>>>> So it seem the answer to my question 1) is that TableMacros are the
> >> way
> >>> to
> >>>>> go.
> >>>>> I'm still wondering about *3) is the table(...) wrapping syntax
> >>>>> necessary?*
> >>>>>
> >>>>>
> >>>>> Consider:
> >>>>>
> >>>>> select * from myTable as f(x, y)
> >>>>> select * from myTable f(x, y)
> >>>>> select * from myFunction(x, y)
> >>>>>
> >>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
> >> if
> >>> f
> >>>>> is a function with zero arguments, could you invoke it like this?:
> >>>>>
> >>>>> select * from f
> >>>>>
> >>>>> I don’t know the actual rationale. But I know that the SQL standards
> >>>>> people in their wisdom decided to add a keyword to disambiguate.
> >>>>>
> >>>>> I had to fix some things in Calcite to enable this:
> >>>>> https://github.com/dremio/calcite/pull/1/files
> >>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
> >>>>> Calcite for the Maze example.
> >>>>> Which is why some hooks were missing.
> >>>>>
> >>>>>
> >>>>> Can you log a jira case to track this bug?
> >>>>>
> >>>>>
> >>>>> I think I found a bug in Calcite but I'd need help to fix it.
> >>>>> Here is a test that reproduces the problem:
> >>>>> https://github.com/apache/calcite/pull/166
> >>>>> If we return more than 1 TableFunction with the same name, we get a
> >> NPE
> >>>>> later on.
> >>>>>
> >>>>>
> >>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
> >> case
> >>>>> on resolution of overloaded functions when invoked with named
> >> arguments.
> >>>>> (It probably applies to all functions, not just table functions.) The
> >>> fix
> >>>>> will take a while (if you wait for me to write it).
> >>>>>
> >>>>> For now please tell your users not to overload. :)
> >>>>>
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Julien
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > *Jim Scott*
> > Director, Enterprise Strategy & Architecture
> > +1 (347) 746-9281
> > @kingmesal <https://twitter.com/kingmesal>
> >
> > <http://www.mapr.com/>
> > [image: MapR Technologies] <http://www.mapr.com>
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>
>

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

I can see why Jacques wants this syntax.

However a “switch" in a grammar is a bad idea. Grammars need to be predictable. Any variation should happen at validation time, or later.

Also, we shouldn’t add configuration parameters as a way of avoiding a tough design discussion. 

EXTENDS and eliding TABLE are both extensions to standard SQL, and they are both applicable to Drill and Phoenix. I think Drill and Phoenix (by which I mean Jacques and James, I guess) need to agree what the SQL syntax should be.

Julian


> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> 
> Looking at those two examples I agree with Jacques. The first appears more
> like a hint from the syntactic sugar point of view.
> 
> 
> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:
> 
>> Since EXTEND is custom functionality, it seems reasonable that we could
>> have a switch. Given that SQL Server and Postgres support it seems
>> reasonable to support the table functions without the TABLE syntax.
>> 
>> I for one definitely think the TABLE syntax is much more confusing to use,
>> especially in the example that we're looking to support, such as:
>> 
>> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> '|', skipFirstRow => true)
>> 
>> This seems much clearer than:
>> 
>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
>> => '|', skipFirstRow => true))
>> 
>> It also looks much more like a hint to the table (which is our goal).
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>> 
>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>>> Thanks for doing the legwork and finding what the other vendors do. It is
>>> indeed compelling that SQL Server and Postgres go beyond the standard an
>>> make the TABLE keyword optional.
>>> 
>>> I tried that syntax in Calcite and discovered that there is a clash with
>>> one of our own (few) extensions. In
>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
>>> clause. You can write
>>> 
>>>  SELECT *
>>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> to tell Calcite that there are two undeclared columns in the Emp table
>> but
>>> you would like to use them in this particular query. We chose to make the
>>> EXTEND keyword optional, so you could instead write
>>> 
>>>  SELECT *
>>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> That is uncomfortably close to
>>> 
>>>  SELECT *
>>>  FROM EmpFunction (favoriteBand, golfHandicap);
>>> 
>>> so we would require
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>> 
>>> if EmpFunction was a table-function. You could combine the two forms like
>>> this:
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>> (anotherAttribute INTEGER);
>>> 
>>> We could revisit whether EXTEND is optional, I suppose. But we should
>> also
>>> ask whether requiring folks to type TABLE is such a hardship.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>> 
>>>> - Table function syntax: I did a quick search and it seems there's no
>>>> consensus about this.
>>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>>>> expect it.
>>>> MySQL does not have table functions [5]
>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>> Would it be reasonable to allow a switch in the grammar generation to
>>> have
>>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>>>> syntax (back ticks for identifiers etc)
>>>> 
>>>> [1]
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>> [2]
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>> [5]
>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>> 
>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>> But that seems too easy to be true. Possibly this method is called more
>>>> than once (before and after the function has been resolved?)
>>>> 
>>>> FYI this would happen only when using named parameter. We do want to
>>>> overload in this case, which is why I'm looking into it.
>>>> 
>>>> I'll fill a JIRA for my other branch
>>>> 
>>>> Julien
>>>> 
>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>> 
>>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>>>>> there's more than 1 function with the same name.
>>>>> 
>>>>> 
>>>>> Yes; see below.
>>>>> 
>>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>>> being
>>>>> able to specify the delimiter for csv files.
>>>>> So it seem the answer to my question 1) is that TableMacros are the
>> way
>>> to
>>>>> go.
>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>> necessary?*
>>>>> 
>>>>> 
>>>>> Consider:
>>>>> 
>>>>> select * from myTable as f(x, y)
>>>>> select * from myTable f(x, y)
>>>>> select * from myFunction(x, y)
>>>>> 
>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> if
>>> f
>>>>> is a function with zero arguments, could you invoke it like this?:
>>>>> 
>>>>> select * from f
>>>>> 
>>>>> I don’t know the actual rationale. But I know that the SQL standards
>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>> 
>>>>> I had to fix some things in Calcite to enable this:
>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>>>>> Calcite for the Maze example.
>>>>> Which is why some hooks were missing.
>>>>> 
>>>>> 
>>>>> Can you log a jira case to track this bug?
>>>>> 
>>>>> 
>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>> Here is a test that reproduces the problem:
>>>>> https://github.com/apache/calcite/pull/166
>>>>> If we return more than 1 TableFunction with the same name, we get a
>> NPE
>>>>> later on.
>>>>> 
>>>>> 
>>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> case
>>>>> on resolution of overloaded functions when invoked with named
>> arguments.
>>>>> (It probably applies to all functions, not just table functions.) The
>>> fix
>>>>> will take a while (if you wait for me to write it).
>>>>> 
>>>>> For now please tell your users not to overload. :)
>>>>> 
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Julien
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
> 
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
> 
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

I can see why Jacques wants this syntax.

However a “switch" in a grammar is a bad idea. Grammars need to be predictable. Any variation should happen at validation time, or later.

Also, we shouldn’t add configuration parameters as a way of avoiding a tough design discussion. 

EXTENDS and eliding TABLE are both extensions to standard SQL, and they are both applicable to Drill and Phoenix. I think Drill and Phoenix (by which I mean Jacques and James, I guess) need to agree what the SQL syntax should be.

Julian


> On Nov 7, 2015, at 10:40 AM, Jim Scott <js...@maprtech.com> wrote:
> 
> Looking at those two examples I agree with Jacques. The first appears more
> like a hint from the syntactic sugar point of view.
> 
> 
> On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:
> 
>> Since EXTEND is custom functionality, it seems reasonable that we could
>> have a switch. Given that SQL Server and Postgres support it seems
>> reasonable to support the table functions without the TABLE syntax.
>> 
>> I for one definitely think the TABLE syntax is much more confusing to use,
>> especially in the example that we're looking to support, such as:
>> 
>> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
>> '|', skipFirstRow => true)
>> 
>> This seems much clearer than:
>> 
>> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
>> => '|', skipFirstRow => true))
>> 
>> It also looks much more like a hint to the table (which is our goal).
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>> 
>> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>>> Thanks for doing the legwork and finding what the other vendors do. It is
>>> indeed compelling that SQL Server and Postgres go beyond the standard an
>>> make the TABLE keyword optional.
>>> 
>>> I tried that syntax in Calcite and discovered that there is a clash with
>>> one of our own (few) extensions. In
>>> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
>>> clause. You can write
>>> 
>>>  SELECT *
>>>  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> to tell Calcite that there are two undeclared columns in the Emp table
>> but
>>> you would like to use them in this particular query. We chose to make the
>>> EXTEND keyword optional, so you could instead write
>>> 
>>>  SELECT *
>>>  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>>>  WHERE goldHandicap < 10;
>>> 
>>> That is uncomfortably close to
>>> 
>>>  SELECT *
>>>  FROM EmpFunction (favoriteBand, golfHandicap);
>>> 
>>> so we would require
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>>> 
>>> if EmpFunction was a table-function. You could combine the two forms like
>>> this:
>>> 
>>>  SELECT *
>>>  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
>>> (anotherAttribute INTEGER);
>>> 
>>> We could revisit whether EXTEND is optional, I suppose. But we should
>> also
>>> ask whether requiring folks to type TABLE is such a hardship.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>> 
>>>> - Table function syntax: I did a quick search and it seems there's no
>>>> consensus about this.
>>>> It seems that Posgres [1] and SQL Server [2] both allow calling table
>>>> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
>>>> expect it.
>>>> MySQL does not have table functions [5]
>>>> 2 for, 2 against and 1 undecided: that's a draw :)
>>>> Would it be reasonable to allow a switch in the grammar generation to
>>> have
>>>> a posgres compatible syntax? Currently in Drill we use the MySQL like
>>>> syntax (back ticks for identifiers etc)
>>>> 
>>>> [1]
>> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
>>>> [2]
>> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
>>>> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
>>>> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
>>>> [5]
>>>> 
>>> 
>> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
>>>> 
>>>> - It seems a simple change in SqlCallBinding fixes the function
>>>> overloading: https://github.com/apache/calcite/pull/166/files
>>>> But that seems too easy to be true. Possibly this method is called more
>>>> than once (before and after the function has been resolved?)
>>>> 
>>>> FYI this would happen only when using named parameter. We do want to
>>>> overload in this case, which is why I'm looking into it.
>>>> 
>>>> I'll fill a JIRA for my other branch
>>>> 
>>>> Julien
>>>> 
>>>> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>> 
>>>>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>>>>> there's more than 1 function with the same name.
>>>>> 
>>>>> 
>>>>> Yes; see below.
>>>>> 
>>>>> FYI: I have a prototype of TableMacro working in Drill. For now just
>>> being
>>>>> able to specify the delimiter for csv files.
>>>>> So it seem the answer to my question 1) is that TableMacros are the
>> way
>>> to
>>>>> go.
>>>>> I'm still wondering about *3) is the table(...) wrapping syntax
>>>>> necessary?*
>>>>> 
>>>>> 
>>>>> Consider:
>>>>> 
>>>>> select * from myTable as f(x, y)
>>>>> select * from myTable f(x, y)
>>>>> select * from myFunction(x, y)
>>>>> 
>>>>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
>> if
>>> f
>>>>> is a function with zero arguments, could you invoke it like this?:
>>>>> 
>>>>> select * from f
>>>>> 
>>>>> I don’t know the actual rationale. But I know that the SQL standards
>>>>> people in their wisdom decided to add a keyword to disambiguate.
>>>>> 
>>>>> I had to fix some things in Calcite to enable this:
>>>>> https://github.com/dremio/calcite/pull/1/files
>>>>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>>>>> Calcite for the Maze example.
>>>>> Which is why some hooks were missing.
>>>>> 
>>>>> 
>>>>> Can you log a jira case to track this bug?
>>>>> 
>>>>> 
>>>>> I think I found a bug in Calcite but I'd need help to fix it.
>>>>> Here is a test that reproduces the problem:
>>>>> https://github.com/apache/calcite/pull/166
>>>>> If we return more than 1 TableFunction with the same name, we get a
>> NPE
>>>>> later on.
>>>>> 
>>>>> 
>>>>> Yes, I knew there was a problem with overloading. Please log a JIRA
>> case
>>>>> on resolution of overloaded functions when invoked with named
>> arguments.
>>>>> (It probably applies to all functions, not just table functions.) The
>>> fix
>>>>> will take a while (if you wait for me to write it).
>>>>> 
>>>>> For now please tell your users not to overload. :)
>>>>> 
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Julien
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
> 
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
> 
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Jim Scott <js...@maprtech.com>.

Looking at those two examples I agree with Jacques. The first appears more
like a hint from the syntactic sugar point of view.


On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Since EXTEND is custom functionality, it seems reasonable that we could
> have a switch. Given that SQL Server and Postgres support it seems
> reasonable to support the table functions without the TABLE syntax.
>
> I for one definitely think the TABLE syntax is much more confusing to use,
> especially in the example that we're looking to support, such as:
>
> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
> '|', skipFirstRow => true)
>
> This seems much clearer than:
>
> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> => '|', skipFirstRow => true))
>
> It also looks much more like a hint to the table (which is our goal).
>
>
>
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > Thanks for doing the legwork and finding what the other vendors do. It is
> > indeed compelling that SQL Server and Postgres go beyond the standard an
> > make the TABLE keyword optional.
> >
> > I tried that syntax in Calcite and discovered that there is a clash with
> > one of our own (few) extensions. In
> > https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> > clause. You can write
> >
> >   SELECT *
> >   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > to tell Calcite that there are two undeclared columns in the Emp table
> but
> > you would like to use them in this particular query. We chose to make the
> > EXTEND keyword optional, so you could instead write
> >
> >   SELECT *
> >   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > That is uncomfortably close to
> >
> >   SELECT *
> >   FROM EmpFunction (favoriteBand, golfHandicap);
> >
> > so we would require
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >
> > if EmpFunction was a table-function. You could combine the two forms like
> > this:
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > (anotherAttribute INTEGER);
> >
> > We could revisit whether EXTEND is optional, I suppose. But we should
> also
> > ask whether requiring folks to type TABLE is such a hardship.
> >
> > Julian
> >
> >
> > > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >
> > > - Table function syntax: I did a quick search and it seems there's no
> > > consensus about this.
> > > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > > expect it.
> > > MySQL does not have table functions [5]
> > > 2 for, 2 against and 1 undecided: that's a draw :)
> > > Would it be reasonable to allow a switch in the grammar generation to
> > have
> > > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > > syntax (back ticks for identifiers etc)
> > >
> > > [1]
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > > [2]
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > > [5]
> > >
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >
> > > - It seems a simple change in SqlCallBinding fixes the function
> > > overloading: https://github.com/apache/calcite/pull/166/files
> > > But that seems too easy to be true. Possibly this method is called more
> > > than once (before and after the function has been resolved?)
> > >
> > > FYI this would happen only when using named parameter. We do want to
> > > overload in this case, which is why I'm looking into it.
> > >
> > > I'll fill a JIRA for my other branch
> > >
> > > Julien
> > >
> > > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >>
> > >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >>
> > >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> > >> there's more than 1 function with the same name.
> > >>
> > >>
> > >> Yes; see below.
> > >>
> > >> FYI: I have a prototype of TableMacro working in Drill. For now just
> > being
> > >> able to specify the delimiter for csv files.
> > >> So it seem the answer to my question 1) is that TableMacros are the
> way
> > to
> > >> go.
> > >> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> necessary?*
> > >>
> > >>
> > >> Consider:
> > >>
> > >> select * from myTable as f(x, y)
> > >> select * from myTable f(x, y)
> > >> select * from myFunction(x, y)
> > >>
> > >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
> if
> > f
> > >> is a function with zero arguments, could you invoke it like this?:
> > >>
> > >> select * from f
> > >>
> > >> I don’t know the actual rationale. But I know that the SQL standards
> > >> people in their wisdom decided to add a keyword to disambiguate.
> > >>
> > >> I had to fix some things in Calcite to enable this:
> > >> https://github.com/dremio/calcite/pull/1/files
> > >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> > >> Calcite for the Maze example.
> > >> Which is why some hooks were missing.
> > >>
> > >>
> > >> Can you log a jira case to track this bug?
> > >>
> > >>
> > >> I think I found a bug in Calcite but I'd need help to fix it.
> > >> Here is a test that reproduces the problem:
> > >> https://github.com/apache/calcite/pull/166
> > >> If we return more than 1 TableFunction with the same name, we get a
> NPE
> > >> later on.
> > >>
> > >>
> > >> Yes, I knew there was a problem with overloading. Please log a JIRA
> case
> > >> on resolution of overloaded functions when invoked with named
> arguments.
> > >> (It probably applies to all functions, not just table functions.) The
> > fix
> > >> will take a while (if you wait for me to write it).
> > >>
> > >> For now please tell your users not to overload. :)
> > >>
> > >>
> > >> Julian
> > >>
> > >>
> > >
> > >
> > > --
> > > Julien
> >
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Jim Scott <js...@maprtech.com>.

Looking at those two examples I agree with Jacques. The first appears more
like a hint from the syntactic sugar point of view.


On Fri, Nov 6, 2015 at 11:53 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Since EXTEND is custom functionality, it seems reasonable that we could
> have a switch. Given that SQL Server and Postgres support it seems
> reasonable to support the table functions without the TABLE syntax.
>
> I for one definitely think the TABLE syntax is much more confusing to use,
> especially in the example that we're looking to support, such as:
>
> select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
> '|', skipFirstRow => true)
>
> This seems much clearer than:
>
> select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
> => '|', skipFirstRow => true))
>
> It also looks much more like a hint to the table (which is our goal).
>
>
>
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > Thanks for doing the legwork and finding what the other vendors do. It is
> > indeed compelling that SQL Server and Postgres go beyond the standard an
> > make the TABLE keyword optional.
> >
> > I tried that syntax in Calcite and discovered that there is a clash with
> > one of our own (few) extensions. In
> > https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> > clause. You can write
> >
> >   SELECT *
> >   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > to tell Calcite that there are two undeclared columns in the Emp table
> but
> > you would like to use them in this particular query. We chose to make the
> > EXTEND keyword optional, so you could instead write
> >
> >   SELECT *
> >   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
> >   WHERE goldHandicap < 10;
> >
> > That is uncomfortably close to
> >
> >   SELECT *
> >   FROM EmpFunction (favoriteBand, golfHandicap);
> >
> > so we would require
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
> >
> > if EmpFunction was a table-function. You could combine the two forms like
> > this:
> >
> >   SELECT *
> >   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> > (anotherAttribute INTEGER);
> >
> > We could revisit whether EXTEND is optional, I suppose. But we should
> also
> > ask whether requiring folks to type TABLE is such a hardship.
> >
> > Julian
> >
> >
> > > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >
> > > - Table function syntax: I did a quick search and it seems there's no
> > > consensus about this.
> > > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > > expect it.
> > > MySQL does not have table functions [5]
> > > 2 for, 2 against and 1 undecided: that's a draw :)
> > > Would it be reasonable to allow a switch in the grammar generation to
> > have
> > > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > > syntax (back ticks for identifiers etc)
> > >
> > > [1]
> http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > > [2]
> https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > > [5]
> > >
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> > >
> > > - It seems a simple change in SqlCallBinding fixes the function
> > > overloading: https://github.com/apache/calcite/pull/166/files
> > > But that seems too easy to be true. Possibly this method is called more
> > > than once (before and after the function has been resolved?)
> > >
> > > FYI this would happen only when using named parameter. We do want to
> > > overload in this case, which is why I'm looking into it.
> > >
> > > I'll fill a JIRA for my other branch
> > >
> > > Julien
> > >
> > > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >>
> > >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > >>
> > >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> > >> there's more than 1 function with the same name.
> > >>
> > >>
> > >> Yes; see below.
> > >>
> > >> FYI: I have a prototype of TableMacro working in Drill. For now just
> > being
> > >> able to specify the delimiter for csv files.
> > >> So it seem the answer to my question 1) is that TableMacros are the
> way
> > to
> > >> go.
> > >> I'm still wondering about *3) is the table(...) wrapping syntax
> > >> necessary?*
> > >>
> > >>
> > >> Consider:
> > >>
> > >> select * from myTable as f(x, y)
> > >> select * from myTable f(x, y)
> > >> select * from myFunction(x, y)
> > >>
> > >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also,
> if
> > f
> > >> is a function with zero arguments, could you invoke it like this?:
> > >>
> > >> select * from f
> > >>
> > >> I don’t know the actual rationale. But I know that the SQL standards
> > >> people in their wisdom decided to add a keyword to disambiguate.
> > >>
> > >> I had to fix some things in Calcite to enable this:
> > >> https://github.com/dremio/calcite/pull/1/files
> > >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> > >> Calcite for the Maze example.
> > >> Which is why some hooks were missing.
> > >>
> > >>
> > >> Can you log a jira case to track this bug?
> > >>
> > >>
> > >> I think I found a bug in Calcite but I'd need help to fix it.
> > >> Here is a test that reproduces the problem:
> > >> https://github.com/apache/calcite/pull/166
> > >> If we return more than 1 TableFunction with the same name, we get a
> NPE
> > >> later on.
> > >>
> > >>
> > >> Yes, I knew there was a problem with overloading. Please log a JIRA
> case
> > >> on resolution of overloaded functions when invoked with named
> arguments.
> > >> (It probably applies to all functions, not just table functions.) The
> > fix
> > >> will take a while (if you wait for me to write it).
> > >>
> > >> For now please tell your users not to overload. :)
> > >>
> > >>
> > >> Julian
> > >>
> > >>
> > >
> > >
> > > --
> > > Julien
> >
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

Since EXTEND is custom functionality, it seems reasonable that we could
have a switch. Given that SQL Server and Postgres support it seems
reasonable to support the table functions without the TABLE syntax.

I for one definitely think the TABLE syntax is much more confusing to use,
especially in the example that we're looking to support, such as:

select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
'|', skipFirstRow => true)

This seems much clearer than:

select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
=> '|', skipFirstRow => true))

It also looks much more like a hint to the table (which is our goal).







--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:

> Thanks for doing the legwork and finding what the other vendors do. It is
> indeed compelling that SQL Server and Postgres go beyond the standard an
> make the TABLE keyword optional.
>
> I tried that syntax in Calcite and discovered that there is a clash with
> one of our own (few) extensions. In
> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> clause. You can write
>
>   SELECT *
>   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> to tell Calcite that there are two undeclared columns in the Emp table but
> you would like to use them in this particular query. We chose to make the
> EXTEND keyword optional, so you could instead write
>
>   SELECT *
>   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> That is uncomfortably close to
>
>   SELECT *
>   FROM EmpFunction (favoriteBand, golfHandicap);
>
> so we would require
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>
> if EmpFunction was a table-function. You could combine the two forms like
> this:
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> (anotherAttribute INTEGER);
>
> We could revisit whether EXTEND is optional, I suppose. But we should also
> ask whether requiring folks to type TABLE is such a hardship.
>
> Julian
>
>
> > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > - Table function syntax: I did a quick search and it seems there's no
> > consensus about this.
> > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > expect it.
> > MySQL does not have table functions [5]
> > 2 for, 2 against and 1 undecided: that's a draw :)
> > Would it be reasonable to allow a switch in the grammar generation to
> have
> > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > syntax (back ticks for identifiers etc)
> >
> > [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > [5]
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >
> > - It seems a simple change in SqlCallBinding fixes the function
> > overloading: https://github.com/apache/calcite/pull/166/files
> > But that seems too easy to be true. Possibly this method is called more
> > than once (before and after the function has been resolved?)
> >
> > FYI this would happen only when using named parameter. We do want to
> > overload in this case, which is why I'm looking into it.
> >
> > I'll fill a JIRA for my other branch
> >
> > Julien
> >
> > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >>
> >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>
> >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> >> there's more than 1 function with the same name.
> >>
> >>
> >> Yes; see below.
> >>
> >> FYI: I have a prototype of TableMacro working in Drill. For now just
> being
> >> able to specify the delimiter for csv files.
> >> So it seem the answer to my question 1) is that TableMacros are the way
> to
> >> go.
> >> I'm still wondering about *3) is the table(...) wrapping syntax
> >> necessary?*
> >>
> >>
> >> Consider:
> >>
> >> select * from myTable as f(x, y)
> >> select * from myTable f(x, y)
> >> select * from myFunction(x, y)
> >>
> >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if
> f
> >> is a function with zero arguments, could you invoke it like this?:
> >>
> >> select * from f
> >>
> >> I don’t know the actual rationale. But I know that the SQL standards
> >> people in their wisdom decided to add a keyword to disambiguate.
> >>
> >> I had to fix some things in Calcite to enable this:
> >> https://github.com/dremio/calcite/pull/1/files
> >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> >> Calcite for the Maze example.
> >> Which is why some hooks were missing.
> >>
> >>
> >> Can you log a jira case to track this bug?
> >>
> >>
> >> I think I found a bug in Calcite but I'd need help to fix it.
> >> Here is a test that reproduces the problem:
> >> https://github.com/apache/calcite/pull/166
> >> If we return more than 1 TableFunction with the same name, we get a NPE
> >> later on.
> >>
> >>
> >> Yes, I knew there was a problem with overloading. Please log a JIRA case
> >> on resolution of overloaded functions when invoked with named arguments.
> >> (It probably applies to all functions, not just table functions.) The
> fix
> >> will take a while (if you wait for me to write it).
> >>
> >> For now please tell your users not to overload. :)
> >>
> >>
> >> Julian
> >>
> >>
> >
> >
> > --
> > Julien
>
>

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

Since EXTEND is custom functionality, it seems reasonable that we could
have a switch. Given that SQL Server and Postgres support it seems
reasonable to support the table functions without the TABLE syntax.

I for one definitely think the TABLE syntax is much more confusing to use,
especially in the example that we're looking to support, such as:

select * from dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter =>
'|', skipFirstRow => true)

This seems much clearer than:

select * from TABLE(dfs.`/myfolder/mytable` (type => 'CSV', fieldDelimiter
=> '|', skipFirstRow => true))

It also looks much more like a hint to the table (which is our goal).







--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 6, 2015 at 9:15 PM, Julian Hyde <jh...@apache.org> wrote:

> Thanks for doing the legwork and finding what the other vendors do. It is
> indeed compelling that SQL Server and Postgres go beyond the standard an
> make the TABLE keyword optional.
>
> I tried that syntax in Calcite and discovered that there is a clash with
> one of our own (few) extensions. In
> https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS
> clause. You can write
>
>   SELECT *
>   FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> to tell Calcite that there are two undeclared columns in the Emp table but
> you would like to use them in this particular query. We chose to make the
> EXTEND keyword optional, so you could instead write
>
>   SELECT *
>   FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
>   WHERE goldHandicap < 10;
>
> That is uncomfortably close to
>
>   SELECT *
>   FROM EmpFunction (favoriteBand, golfHandicap);
>
> so we would require
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap));
>
> if EmpFunction was a table-function. You could combine the two forms like
> this:
>
>   SELECT *
>   FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND
> (anotherAttribute INTEGER);
>
> We could revisit whether EXTEND is optional, I suppose. But we should also
> ask whether requiring folks to type TABLE is such a hardship.
>
> Julian
>
>
> > On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > - Table function syntax: I did a quick search and it seems there's no
> > consensus about this.
> > It seems that Posgres [1] and SQL Server [2] both allow calling table
> > functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> > expect it.
> > MySQL does not have table functions [5]
> > 2 for, 2 against and 1 undecided: that's a draw :)
> > Would it be reasonable to allow a switch in the grammar generation to
> have
> > a posgres compatible syntax? Currently in Drill we use the MySQL like
> > syntax (back ticks for identifiers etc)
> >
> > [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> > [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> > [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> > [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> > [5]
> >
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> >
> > - It seems a simple change in SqlCallBinding fixes the function
> > overloading: https://github.com/apache/calcite/pull/166/files
> > But that seems too easy to be true. Possibly this method is called more
> > than once (before and after the function has been resolved?)
> >
> > FYI this would happen only when using named parameter. We do want to
> > overload in this case, which is why I'm looking into it.
> >
> > I'll fill a JIRA for my other branch
> >
> > Julien
> >
> > On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >>
> >> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >>
> >> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> >> there's more than 1 function with the same name.
> >>
> >>
> >> Yes; see below.
> >>
> >> FYI: I have a prototype of TableMacro working in Drill. For now just
> being
> >> able to specify the delimiter for csv files.
> >> So it seem the answer to my question 1) is that TableMacros are the way
> to
> >> go.
> >> I'm still wondering about *3) is the table(...) wrapping syntax
> >> necessary?*
> >>
> >>
> >> Consider:
> >>
> >> select * from myTable as f(x, y)
> >> select * from myTable f(x, y)
> >> select * from myFunction(x, y)
> >>
> >> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if
> f
> >> is a function with zero arguments, could you invoke it like this?:
> >>
> >> select * from f
> >>
> >> I don’t know the actual rationale. But I know that the SQL standards
> >> people in their wisdom decided to add a keyword to disambiguate.
> >>
> >> I had to fix some things in Calcite to enable this:
> >> https://github.com/dremio/calcite/pull/1/files
> >> Drill uses Frameworks.getPlanner() that does not seem to be used in
> >> Calcite for the Maze example.
> >> Which is why some hooks were missing.
> >>
> >>
> >> Can you log a jira case to track this bug?
> >>
> >>
> >> I think I found a bug in Calcite but I'd need help to fix it.
> >> Here is a test that reproduces the problem:
> >> https://github.com/apache/calcite/pull/166
> >> If we return more than 1 TableFunction with the same name, we get a NPE
> >> later on.
> >>
> >>
> >> Yes, I knew there was a problem with overloading. Please log a JIRA case
> >> on resolution of overloaded functions when invoked with named arguments.
> >> (It probably applies to all functions, not just table functions.) The
> fix
> >> will take a while (if you wait for me to write it).
> >>
> >> For now please tell your users not to overload. :)
> >>
> >>
> >> Julian
> >>
> >>
> >
> >
> > --
> > Julien
>
>

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks for doing the legwork and finding what the other vendors do. It is indeed compelling that SQL Server and Postgres go beyond the standard an make the TABLE keyword optional.

I tried that syntax in Calcite and discovered that there is a clash with one of our own (few) extensions. In https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS clause. You can write

  SELECT *
  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

to tell Calcite that there are two undeclared columns in the Emp table but you would like to use them in this particular query. We chose to make the EXTEND keyword optional, so you could instead write

  SELECT *
  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

That is uncomfortably close to

  SELECT *
  FROM EmpFunction (favoriteBand, golfHandicap);

so we would require

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));

if EmpFunction was a table-function. You could combine the two forms like this:

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND (anotherAttribute INTEGER);

We could revisit whether EXTEND is optional, I suppose. But we should also ask whether requiring folks to type TABLE is such a hardship.

Julian


> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> - Table function syntax: I did a quick search and it seems there's no
> consensus about this.
> It seems that Posgres [1] and SQL Server [2] both allow calling table
> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> expect it.
> MySQL does not have table functions [5]
> 2 for, 2 against and 1 undecided: that's a draw :)
> Would it be reasonable to allow a switch in the grammar generation to have
> a posgres compatible syntax? Currently in Drill we use the MySQL like
> syntax (back ticks for identifiers etc)
> 
> [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> [5]
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> 
> - It seems a simple change in SqlCallBinding fixes the function
> overloading: https://github.com/apache/calcite/pull/166/files
> But that seems too easy to be true. Possibly this method is called more
> than once (before and after the function has been resolved?)
> 
> FYI this would happen only when using named parameter. We do want to
> overload in this case, which is why I'm looking into it.
> 
> I'll fill a JIRA for my other branch
> 
> Julien
> 
> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> 
>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> 
>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>> there's more than 1 function with the same name.
>> 
>> 
>> Yes; see below.
>> 
>> FYI: I have a prototype of TableMacro working in Drill. For now just being
>> able to specify the delimiter for csv files.
>> So it seem the answer to my question 1) is that TableMacros are the way to
>> go.
>> I'm still wondering about *3) is the table(...) wrapping syntax
>> necessary?*
>> 
>> 
>> Consider:
>> 
>> select * from myTable as f(x, y)
>> select * from myTable f(x, y)
>> select * from myFunction(x, y)
>> 
>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
>> is a function with zero arguments, could you invoke it like this?:
>> 
>> select * from f
>> 
>> I don’t know the actual rationale. But I know that the SQL standards
>> people in their wisdom decided to add a keyword to disambiguate.
>> 
>> I had to fix some things in Calcite to enable this:
>> https://github.com/dremio/calcite/pull/1/files
>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> Calcite for the Maze example.
>> Which is why some hooks were missing.
>> 
>> 
>> Can you log a jira case to track this bug?
>> 
>> 
>> I think I found a bug in Calcite but I'd need help to fix it.
>> Here is a test that reproduces the problem:
>> https://github.com/apache/calcite/pull/166
>> If we return more than 1 TableFunction with the same name, we get a NPE
>> later on.
>> 
>> 
>> Yes, I knew there was a problem with overloading. Please log a JIRA case
>> on resolution of overloaded functions when invoked with named arguments.
>> (It probably applies to all functions, not just table functions.) The fix
>> will take a while (if you wait for me to write it).
>> 
>> For now please tell your users not to overload. :)
>> 
>> 
>> Julian
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Thanks for doing the legwork and finding what the other vendors do. It is indeed compelling that SQL Server and Postgres go beyond the standard an make the TABLE keyword optional.

I tried that syntax in Calcite and discovered that there is a clash with one of our own (few) extensions. In https://issues.apache.org/jira/browse/CALCITE-493 we added the EXTENDS clause. You can write

  SELECT *
  FROM Emp EXTEND (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

to tell Calcite that there are two undeclared columns in the Emp table but you would like to use them in this particular query. We chose to make the EXTEND keyword optional, so you could instead write

  SELECT *
  FROM Emp (favoriteBand VARCHAR(100), golfHandicap INTEGER)
  WHERE goldHandicap < 10;

That is uncomfortably close to

  SELECT *
  FROM EmpFunction (favoriteBand, golfHandicap);

so we would require

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap));

if EmpFunction was a table-function. You could combine the two forms like this:

  SELECT *
  FROM TABLE(EmpFunction (favoriteBand, golfHandicap)) EXTEND (anotherAttribute INTEGER);

We could revisit whether EXTEND is optional, I suppose. But we should also ask whether requiring folks to type TABLE is such a hardship.

Julian


> On Nov 6, 2015, at 2:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> - Table function syntax: I did a quick search and it seems there's no
> consensus about this.
> It seems that Posgres [1] and SQL Server [2] both allow calling table
> functions without the table(...) wrapper while Oracle [3] and DB2 [4]
> expect it.
> MySQL does not have table functions [5]
> 2 for, 2 against and 1 undecided: that's a draw :)
> Would it be reasonable to allow a switch in the grammar generation to have
> a posgres compatible syntax? Currently in Drill we use the MySQL like
> syntax (back ticks for identifiers etc)
> 
> [1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
> [2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
> [3] https://oracle-base.com/articles/misc/pipelined-table-functions
> [4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
> [5]
> http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table
> 
> - It seems a simple change in SqlCallBinding fixes the function
> overloading: https://github.com/apache/calcite/pull/166/files
> But that seems too easy to be true. Possibly this method is called more
> than once (before and after the function has been resolved?)
> 
> FYI this would happen only when using named parameter. We do want to
> overload in this case, which is why I'm looking into it.
> 
> I'll fill a JIRA for my other branch
> 
> Julien
> 
> On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> 
>> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> 
>> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
>> there's more than 1 function with the same name.
>> 
>> 
>> Yes; see below.
>> 
>> FYI: I have a prototype of TableMacro working in Drill. For now just being
>> able to specify the delimiter for csv files.
>> So it seem the answer to my question 1) is that TableMacros are the way to
>> go.
>> I'm still wondering about *3) is the table(...) wrapping syntax
>> necessary?*
>> 
>> 
>> Consider:
>> 
>> select * from myTable as f(x, y)
>> select * from myTable f(x, y)
>> select * from myFunction(x, y)
>> 
>> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
>> is a function with zero arguments, could you invoke it like this?:
>> 
>> select * from f
>> 
>> I don’t know the actual rationale. But I know that the SQL standards
>> people in their wisdom decided to add a keyword to disambiguate.
>> 
>> I had to fix some things in Calcite to enable this:
>> https://github.com/dremio/calcite/pull/1/files
>> Drill uses Frameworks.getPlanner() that does not seem to be used in
>> Calcite for the Maze example.
>> Which is why some hooks were missing.
>> 
>> 
>> Can you log a jira case to track this bug?
>> 
>> 
>> I think I found a bug in Calcite but I'd need help to fix it.
>> Here is a test that reproduces the problem:
>> https://github.com/apache/calcite/pull/166
>> If we return more than 1 TableFunction with the same name, we get a NPE
>> later on.
>> 
>> 
>> Yes, I knew there was a problem with overloading. Please log a JIRA case
>> on resolution of overloaded functions when invoked with named arguments.
>> (It probably applies to all functions, not just table functions.) The fix
>> will take a while (if you wait for me to write it).
>> 
>> For now please tell your users not to overload. :)
>> 
>> 
>> Julian
>> 
>> 
> 
> 
> -- 
> Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

- Table function syntax: I did a quick search and it seems there's no
consensus about this.
It seems that Posgres [1] and SQL Server [2] both allow calling table
functions without the table(...) wrapper while Oracle [3] and DB2 [4]
expect it.
MySQL does not have table functions [5]
2 for, 2 against and 1 undecided: that's a draw :)
Would it be reasonable to allow a switch in the grammar generation to have
a posgres compatible syntax? Currently in Drill we use the MySQL like
syntax (back ticks for identifiers etc)

[1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
[2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
[3] https://oracle-base.com/articles/misc/pipelined-table-functions
[4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
[5]
http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table

- It seems a simple change in SqlCallBinding fixes the function
overloading: https://github.com/apache/calcite/pull/166/files
But that seems too easy to be true. Possibly this method is called more
than once (before and after the function has been resolved?)

FYI this would happen only when using named parameter. We do want to
overload in this case, which is why I'm looking into it.

I'll fill a JIRA for my other branch

Julien

On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:

>
> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> there's more than 1 function with the same name.
>
>
> Yes; see below.
>
> FYI: I have a prototype of TableMacro working in Drill. For now just being
> able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to
> go.
> I'm still wondering about *3) is the table(...) wrapping syntax
> necessary?*
>
>
> Consider:
>
> select * from myTable as f(x, y)
> select * from myTable f(x, y)
> select * from myFunction(x, y)
>
> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
> is a function with zero arguments, could you invoke it like this?:
>
> select * from f
>
> I don’t know the actual rationale. But I know that the SQL standards
> people in their wisdom decided to add a keyword to disambiguate.
>
> I had to fix some things in Calcite to enable this:
> https://github.com/dremio/calcite/pull/1/files
> Drill uses Frameworks.getPlanner() that does not seem to be used in
> Calcite for the Maze example.
> Which is why some hooks were missing.
>
>
> Can you log a jira case to track this bug?
>
>
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166
> If we return more than 1 TableFunction with the same name, we get a NPE
> later on.
>
>
> Yes, I knew there was a problem with overloading. Please log a JIRA case
> on resolution of overloaded functions when invoked with named arguments.
> (It probably applies to all functions, not just table functions.) The fix
> will take a while (if you wait for me to write it).
>
> For now please tell your users not to overload. :)
>
>
> Julian
>
>


-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

- Table function syntax: I did a quick search and it seems there's no
consensus about this.
It seems that Posgres [1] and SQL Server [2] both allow calling table
functions without the table(...) wrapper while Oracle [3] and DB2 [4]
expect it.
MySQL does not have table functions [5]
2 for, 2 against and 1 undecided: that's a draw :)
Would it be reasonable to allow a switch in the grammar generation to have
a posgres compatible syntax? Currently in Drill we use the MySQL like
syntax (back ticks for identifiers etc)

[1] http://www.postgresql.org/docs/7.3/static/xfunc-tablefunctions.html
[2] https://technet.microsoft.com/en-us/library/aa214485(v=sql.80).aspx
[3] https://oracle-base.com/articles/misc/pipelined-table-functions
[4] http://www.ibm.com/developerworks/ibmi/library/i-power-of-udtf/
[5]
http://stackoverflow.com/questions/12163666/mysql-function-to-return-a-table

- It seems a simple change in SqlCallBinding fixes the function
overloading: https://github.com/apache/calcite/pull/166/files
But that seems too easy to be true. Possibly this method is called more
than once (before and after the function has been resolved?)

FYI this would happen only when using named parameter. We do want to
overload in this case, which is why I'm looking into it.

I'll fill a JIRA for my other branch

Julien

On Thu, Nov 5, 2015 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:

>
> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when
> there's more than 1 function with the same name.
>
>
> Yes; see below.
>
> FYI: I have a prototype of TableMacro working in Drill. For now just being
> able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to
> go.
> I'm still wondering about *3) is the table(...) wrapping syntax
> necessary?*
>
>
> Consider:
>
> select * from myTable as f(x, y)
> select * from myTable f(x, y)
> select * from myFunction(x, y)
>
> #1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f
> is a function with zero arguments, could you invoke it like this?:
>
> select * from f
>
> I don’t know the actual rationale. But I know that the SQL standards
> people in their wisdom decided to add a keyword to disambiguate.
>
> I had to fix some things in Calcite to enable this:
> https://github.com/dremio/calcite/pull/1/files
> Drill uses Frameworks.getPlanner() that does not seem to be used in
> Calcite for the Maze example.
> Which is why some hooks were missing.
>
>
> Can you log a jira case to track this bug?
>
>
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166
> If we return more than 1 TableFunction with the same name, we get a NPE
> later on.
>
>
> Yes, I knew there was a problem with overloading. Please log a JIRA case
> on resolution of overloaded functions when invoked with named arguments.
> (It probably applies to all functions, not just table functions.) The fix
> will take a while (if you wait for me to write it).
>
> For now please tell your users not to overload. :)
>
>
> Julian
>
>


-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when there's more than 1 function with the same name.

Yes; see below.

> FYI: I have a prototype of TableMacro working in Drill. For now just being able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to go.
> I'm still wondering about 3) is the table(...) wrapping syntax necessary?

Consider:

select * from myTable as f(x, y)
select * from myTable f(x, y)
select * from myFunction(x, y)

#1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f is a function with zero arguments, could you invoke it like this?:

select * from f

I don’t know the actual rationale. But I know that the SQL standards people in their wisdom decided to add a keyword to disambiguate.

> I had to fix some things in Calcite to enable this: https://github.com/dremio/calcite/pull/1/files <https://github.com/dremio/calcite/pull/1/files>
> Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite for the Maze example.
> Which is why some hooks were missing.

Can you log a jira case to track this bug?

> 
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166 <https://github.com/apache/calcite/pull/166>
> If we return more than 1 TableFunction with the same name, we get a NPE later on.

Yes, I knew there was a problem with overloading. Please log a JIRA case on resolution of overloaded functions when invoked with named arguments. (It probably applies to all functions, not just table functions.) The fix will take a while (if you wait for me to write it).

For now please tell your users not to overload. :)

Julian

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

> On Nov 5, 2015, at 5:00 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> TL;DR: TableMacro works for me; I need help with a bug in Calcite when there's more than 1 function with the same name.

Yes; see below.

> FYI: I have a prototype of TableMacro working in Drill. For now just being able to specify the delimiter for csv files.
> So it seem the answer to my question 1) is that TableMacros are the way to go.
> I'm still wondering about 3) is the table(...) wrapping syntax necessary?

Consider:

select * from myTable as f(x, y)
select * from myTable f(x, y)
select * from myFunction(x, y)

#1 and #2 mean the same thing; #2 and #3 look awfully similar. Also, if f is a function with zero arguments, could you invoke it like this?:

select * from f

I don’t know the actual rationale. But I know that the SQL standards people in their wisdom decided to add a keyword to disambiguate.

> I had to fix some things in Calcite to enable this: https://github.com/dremio/calcite/pull/1/files <https://github.com/dremio/calcite/pull/1/files>
> Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite for the Maze example.
> Which is why some hooks were missing.

Can you log a jira case to track this bug?

> 
> I think I found a bug in Calcite but I'd need help to fix it.
> Here is a test that reproduces the problem:
> https://github.com/apache/calcite/pull/166 <https://github.com/apache/calcite/pull/166>
> If we return more than 1 TableFunction with the same name, we get a NPE later on.

Yes, I knew there was a problem with overloading. Please log a JIRA case on resolution of overloaded functions when invoked with named arguments. (It probably applies to all functions, not just table functions.) The fix will take a while (if you wait for me to write it).

For now please tell your users not to overload. :)

Julian

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

TL;DR: TableMacro works for me; I need help with a bug in Calcite when
there's more than 1 function with the same name.

FYI: I have a prototype of TableMacro working in Drill. For now just being
able to specify the delimiter for csv files.
So it seem the answer to my question 1) is that TableMacros are the way to
go.
I'm still wondering about *3) is the table(...) wrapping syntax necessary?*

I had to fix some things in Calcite to enable this:
https://github.com/dremio/calcite/pull/1/files
Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite
for the Maze example.
Which is why some hooks were missing.

I think I found a bug in Calcite but I'd need help to fix it.
Here is a test that reproduces the problem:
https://github.com/apache/calcite/pull/166
If we return more than 1 TableFunction with the same name, we get a NPE
later on.


On Wed, Nov 4, 2015 at 9:49 AM, Julien Le Dem <ju...@dremio.com> wrote:

> Looking in more details, The DrillTable already has toRel implemented, I
> just need to add "implements TranslatableTable"
> I'll try to implement TableMacro and see what happens.
>
> On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> FYI: here are the change to Calcite I did on the Drill fork:
>> https://github.com/mapr/incubator-calcite/pull/4/files
>> I'll port to the calcite master.
>>
>> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>>> Thanks Julian,
>>> I have looked into using Table Functions in Drill. I had to make some
>>> modifications to the planner so that the function lookup in the Storage
>>> plugin works. I will submit a patch for that.
>>>
>>> I had a few questions:
>>>  *1)* For this particular use case it seems that we could use
>>> TableMacro as all the logic can be happening in the planner. Should I look
>>> into that?
>>>    - Drill Schema returns a DrillTable (which implements Table).
>>>    - A TableMacro returns a TranslatableTable
>>>    - It is not clear to me what a TableFunction returns as it defines
>>> only methods that return types.
>>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>>> only difference with getTable is that we use the function parameters when
>>> producing a table.
>>> For reference: Drill getTable there:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>>> It indirectly calls:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>>
>>>  *2)* The getFunctions method in Schema does not seem to be aware at
>>> all of the context it is called in. I would want to return different
>>> functions depending on where we are in the query (table functions in the
>>> from clause, regular functions in where). Is there a way to know if we are
>>> in the context of a FROM or a WHERE clause?
>>>
>>>  *3)* is the table(...) wrapping syntax necessary?
>>> Note:
>>>   - In Drill back ticks are use for identifiers containing dot or slash.
>>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>>   - single quotes are used to delimit strings: 'my string passed as a
>>> parameter'
>>>
>>>   The current syntax is something like:
>>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>>> It seems that table(...) is redundant since we are in the from clause.
>>>  It could simply be:
>>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>>
>>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>>> parameter? (not the backticks instead of single quotes)
>>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>>> delimiter => '|')*
>>>
>>> Thank you!
>>> Julien
>>>
>>>
>>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>>
>>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>>> wrote:
>>>> > Agreed. We need both select with option and .drill (by etl process or
>>>> by
>>>> > sql ascribe metadata).
>>>> >
>>>> > Let's start with the select with options. My only goal would be to
>>>> make
>>>> > sure that creation of .drill file through SQL uses a similar pattern
>>>> to the
>>>> > select with options. It is also important that tables names are still
>>>> > expressed as identifiers instead of strings (people already have
>>>> enough
>>>> > trouble with remembering whether to use single quotes or backticks).
>>>> If the
>>>> > table function approach is everybody's preferred approach, I think it
>>>> is
>>>> > important to have named parameters per Julian's notes.
>>>> >
>>>> > @Julian, how hard do you think it will be to add named parameters?
>>>>
>>>> I just checked in a fix for
>>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>>
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

TL;DR: TableMacro works for me; I need help with a bug in Calcite when
there's more than 1 function with the same name.

FYI: I have a prototype of TableMacro working in Drill. For now just being
able to specify the delimiter for csv files.
So it seem the answer to my question 1) is that TableMacros are the way to
go.
I'm still wondering about *3) is the table(...) wrapping syntax necessary?*

I had to fix some things in Calcite to enable this:
https://github.com/dremio/calcite/pull/1/files
Drill uses Frameworks.getPlanner() that does not seem to be used in Calcite
for the Maze example.
Which is why some hooks were missing.

I think I found a bug in Calcite but I'd need help to fix it.
Here is a test that reproduces the problem:
https://github.com/apache/calcite/pull/166
If we return more than 1 TableFunction with the same name, we get a NPE
later on.


On Wed, Nov 4, 2015 at 9:49 AM, Julien Le Dem <ju...@dremio.com> wrote:

> Looking in more details, The DrillTable already has toRel implemented, I
> just need to add "implements TranslatableTable"
> I'll try to implement TableMacro and see what happens.
>
> On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> FYI: here are the change to Calcite I did on the Drill fork:
>> https://github.com/mapr/incubator-calcite/pull/4/files
>> I'll port to the calcite master.
>>
>> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>>> Thanks Julian,
>>> I have looked into using Table Functions in Drill. I had to make some
>>> modifications to the planner so that the function lookup in the Storage
>>> plugin works. I will submit a patch for that.
>>>
>>> I had a few questions:
>>>  *1)* For this particular use case it seems that we could use
>>> TableMacro as all the logic can be happening in the planner. Should I look
>>> into that?
>>>    - Drill Schema returns a DrillTable (which implements Table).
>>>    - A TableMacro returns a TranslatableTable
>>>    - It is not clear to me what a TableFunction returns as it defines
>>> only methods that return types.
>>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>>> only difference with getTable is that we use the function parameters when
>>> producing a table.
>>> For reference: Drill getTable there:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>>> It indirectly calls:
>>>
>>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>>
>>>  *2)* The getFunctions method in Schema does not seem to be aware at
>>> all of the context it is called in. I would want to return different
>>> functions depending on where we are in the query (table functions in the
>>> from clause, regular functions in where). Is there a way to know if we are
>>> in the context of a FROM or a WHERE clause?
>>>
>>>  *3)* is the table(...) wrapping syntax necessary?
>>> Note:
>>>   - In Drill back ticks are use for identifiers containing dot or slash.
>>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>>   - single quotes are used to delimit strings: 'my string passed as a
>>> parameter'
>>>
>>>   The current syntax is something like:
>>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>>> delimiter => '|'))*
>>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>>> It seems that table(...) is redundant since we are in the from clause.
>>>  It could simply be:
>>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter
>>> => '|')*
>>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>>
>>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>>> parameter? (not the backticks instead of single quotes)
>>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>>> delimiter => '|')*
>>>
>>> Thank you!
>>> Julien
>>>
>>>
>>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>>
>>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>>> wrote:
>>>> > Agreed. We need both select with option and .drill (by etl process or
>>>> by
>>>> > sql ascribe metadata).
>>>> >
>>>> > Let's start with the select with options. My only goal would be to
>>>> make
>>>> > sure that creation of .drill file through SQL uses a similar pattern
>>>> to the
>>>> > select with options. It is also important that tables names are still
>>>> > expressed as identifiers instead of strings (people already have
>>>> enough
>>>> > trouble with remembering whether to use single quotes or backticks).
>>>> If the
>>>> > table function approach is everybody's preferred approach, I think it
>>>> is
>>>> > important to have named parameters per Julian's notes.
>>>> >
>>>> > @Julian, how hard do you think it will be to add named parameters?
>>>>
>>>> I just checked in a fix for
>>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>>
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Looking in more details, The DrillTable already has toRel implemented, I
just need to add "implements TranslatableTable"
I'll try to implement TableMacro and see what happens.

On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:

> FYI: here are the change to Calcite I did on the Drill fork:
> https://github.com/mapr/incubator-calcite/pull/4/files
> I'll port to the calcite master.
>
> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> Thanks Julian,
>> I have looked into using Table Functions in Drill. I had to make some
>> modifications to the planner so that the function lookup in the Storage
>> plugin works. I will submit a patch for that.
>>
>> I had a few questions:
>>  *1)* For this particular use case it seems that we could use TableMacro
>> as all the logic can be happening in the planner. Should I look into that?
>>    - Drill Schema returns a DrillTable (which implements Table).
>>    - A TableMacro returns a TranslatableTable
>>    - It is not clear to me what a TableFunction returns as it defines
>> only methods that return types.
>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>> only difference with getTable is that we use the function parameters when
>> producing a table.
>> For reference: Drill getTable there:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>> It indirectly calls:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>
>>  *2)* The getFunctions method in Schema does not seem to be aware at all
>> of the context it is called in. I would want to return different functions
>> depending on where we are in the query (table functions in the from clause,
>> regular functions in where). Is there a way to know if we are in the
>> context of a FROM or a WHERE clause?
>>
>>  *3)* is the table(...) wrapping syntax necessary?
>> Note:
>>   - In Drill back ticks are use for identifiers containing dot or slash.
>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>   - single quotes are used to delimit strings: 'my string passed as a
>> parameter'
>>
>>   The current syntax is something like:
>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>> It seems that table(...) is redundant since we are in the from clause.
>>  It could simply be:
>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>> => '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
>> '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>
>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>> parameter? (not the backticks instead of single quotes)
>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>> delimiter => '|')*
>>
>> Thank you!
>> Julien
>>
>>
>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>
>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>> wrote:
>>> > Agreed. We need both select with option and .drill (by etl process or
>>> by
>>> > sql ascribe metadata).
>>> >
>>> > Let's start with the select with options. My only goal would be to make
>>> > sure that creation of .drill file through SQL uses a similar pattern
>>> to the
>>> > select with options. It is also important that tables names are still
>>> > expressed as identifiers instead of strings (people already have enough
>>> > trouble with remembering whether to use single quotes or backticks).
>>> If the
>>> > table function approach is everybody's preferred approach, I think it
>>> is
>>> > important to have named parameters per Julian's notes.
>>> >
>>> > @Julian, how hard do you think it will be to add named parameters?
>>>
>>> I just checked in a fix for
>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Looking in more details, The DrillTable already has toRel implemented, I
just need to add "implements TranslatableTable"
I'll try to implement TableMacro and see what happens.

On Tue, Nov 3, 2015 at 6:07 PM, Julien Le Dem <ju...@dremio.com> wrote:

> FYI: here are the change to Calcite I did on the Drill fork:
> https://github.com/mapr/incubator-calcite/pull/4/files
> I'll port to the calcite master.
>
> On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> Thanks Julian,
>> I have looked into using Table Functions in Drill. I had to make some
>> modifications to the planner so that the function lookup in the Storage
>> plugin works. I will submit a patch for that.
>>
>> I had a few questions:
>>  *1)* For this particular use case it seems that we could use TableMacro
>> as all the logic can be happening in the planner. Should I look into that?
>>    - Drill Schema returns a DrillTable (which implements Table).
>>    - A TableMacro returns a TranslatableTable
>>    - It is not clear to me what a TableFunction returns as it defines
>> only methods that return types.
>>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
>> only difference with getTable is that we use the function parameters when
>> producing a table.
>> For reference: Drill getTable there:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
>> It indirectly calls:
>>
>> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>>
>>  *2)* The getFunctions method in Schema does not seem to be aware at all
>> of the context it is called in. I would want to return different functions
>> depending on where we are in the query (table functions in the from clause,
>> regular functions in where). Is there a way to know if we are in the
>> context of a FROM or a WHERE clause?
>>
>>  *3)* is the table(...) wrapping syntax necessary?
>> Note:
>>   - In Drill back ticks are use for identifiers containing dot or slash.
>> like the path to the file as a table name: dfs.`/path/to/file.ext`
>>   - single quotes are used to delimit strings: 'my string passed as a
>> parameter'
>>
>>   The current syntax is something like:
>> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
>> delimiter => '|'))*
>> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
>> It seems that table(...) is redundant since we are in the from clause.
>>  It could simply be:
>> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
>> => '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
>> '|')*
>> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>>
>>  *4)* Can a table be a parameter? If yes, how do we declare a table
>> parameter? (not the backticks instead of single quotes)
>> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
>> delimiter => '|')*
>>
>> Thank you!
>> Julien
>>
>>
>> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>>
>>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>>> wrote:
>>> > Agreed. We need both select with option and .drill (by etl process or
>>> by
>>> > sql ascribe metadata).
>>> >
>>> > Let's start with the select with options. My only goal would be to make
>>> > sure that creation of .drill file through SQL uses a similar pattern
>>> to the
>>> > select with options. It is also important that tables names are still
>>> > expressed as identifiers instead of strings (people already have enough
>>> > trouble with remembering whether to use single quotes or backticks).
>>> If the
>>> > table function approach is everybody's preferred approach, I think it
>>> is
>>> > important to have named parameters per Julian's notes.
>>> >
>>> > @Julian, how hard do you think it will be to add named parameters?
>>>
>>> I just checked in a fix for
>>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

FYI: here are the change to Calcite I did on the Drill fork:
https://github.com/mapr/incubator-calcite/pull/4/files
I'll port to the calcite master.

On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Thanks Julian,
> I have looked into using Table Functions in Drill. I had to make some
> modifications to the planner so that the function lookup in the Storage
> plugin works. I will submit a patch for that.
>
> I had a few questions:
>  *1)* For this particular use case it seems that we could use TableMacro
> as all the logic can be happening in the planner. Should I look into that?
>    - Drill Schema returns a DrillTable (which implements Table).
>    - A TableMacro returns a TranslatableTable
>    - It is not clear to me what a TableFunction returns as it defines only
> methods that return types.
>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
> only difference with getTable is that we use the function parameters when
> producing a table.
> For reference: Drill getTable there:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
> It indirectly calls:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>
>  *2)* The getFunctions method in Schema does not seem to be aware at all
> of the context it is called in. I would want to return different functions
> depending on where we are in the query (table functions in the from clause,
> regular functions in where). Is there a way to know if we are in the
> context of a FROM or a WHERE clause?
>
>  *3)* is the table(...) wrapping syntax necessary?
> Note:
>   - In Drill back ticks are use for identifiers containing dot or slash.
> like the path to the file as a table name: dfs.`/path/to/file.ext`
>   - single quotes are used to delimit strings: 'my string passed as a
> parameter'
>
>   The current syntax is something like:
> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
> It seems that table(...) is redundant since we are in the from clause.
>  It could simply be:
> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
> => '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
> '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>
>  *4)* Can a table be a parameter? If yes, how do we declare a table
> parameter? (not the backticks instead of single quotes)
> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
> delimiter => '|')*
>
> Thank you!
> Julien
>
>
> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> > Agreed. We need both select with option and .drill (by etl process or by
>> > sql ascribe metadata).
>> >
>> > Let's start with the select with options. My only goal would be to make
>> > sure that creation of .drill file through SQL uses a similar pattern to
>> the
>> > select with options. It is also important that tables names are still
>> > expressed as identifiers instead of strings (people already have enough
>> > trouble with remembering whether to use single quotes or backticks). If
>> the
>> > table function approach is everybody's preferred approach, I think it is
>> > important to have named parameters per Julian's notes.
>> >
>> > @Julian, how hard do you think it will be to add named parameters?
>>
>> I just checked in a fix for
>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

FYI: here are the change to Calcite I did on the Drill fork:
https://github.com/mapr/incubator-calcite/pull/4/files
I'll port to the calcite master.

On Tue, Nov 3, 2015 at 5:17 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Thanks Julian,
> I have looked into using Table Functions in Drill. I had to make some
> modifications to the planner so that the function lookup in the Storage
> plugin works. I will submit a patch for that.
>
> I had a few questions:
>  *1)* For this particular use case it seems that we could use TableMacro
> as all the logic can be happening in the planner. Should I look into that?
>    - Drill Schema returns a DrillTable (which implements Table).
>    - A TableMacro returns a TranslatableTable
>    - It is not clear to me what a TableFunction returns as it defines only
> methods that return types.
>  Ideally I'd like to produce a DrillTable like getTable in Schema, the
> only difference with getTable is that we use the function parameters when
> producing a table.
> For reference: Drill getTable there:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
> It indirectly calls:
>
> https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317
>
>  *2)* The getFunctions method in Schema does not seem to be aware at all
> of the context it is called in. I would want to return different functions
> depending on where we are in the query (table functions in the from clause,
> regular functions in where). Is there a way to know if we are in the
> context of a FROM or a WHERE clause?
>
>  *3)* is the table(...) wrapping syntax necessary?
> Note:
>   - In Drill back ticks are use for identifiers containing dot or slash.
> like the path to the file as a table name: dfs.`/path/to/file.ext`
>   - single quotes are used to delimit strings: 'my string passed as a
> parameter'
>
>   The current syntax is something like:
> *     select * from table(dfs.delimitedFile(path => '/path/to/file',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'text',
> delimiter => '|'))*
> *     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
> It seems that table(...) is redundant since we are in the from clause.
>  It could simply be:
> *     select * from dfs.delimitedFile(path => '/path/to/file', delimiter
> => '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
> '|')*
> *     select * from dfs.`**/path/to/file`**(type => 'json')*
>
>  *4)* Can a table be a parameter? If yes, how do we declare a table
> parameter? (not the backticks instead of single quotes)
> *     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
> delimiter => '|')*
>
> Thank you!
> Julien
>
>
> On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> > Agreed. We need both select with option and .drill (by etl process or by
>> > sql ascribe metadata).
>> >
>> > Let's start with the select with options. My only goal would be to make
>> > sure that creation of .drill file through SQL uses a similar pattern to
>> the
>> > select with options. It is also important that tables names are still
>> > expressed as identifiers instead of strings (people already have enough
>> > trouble with remembering whether to use single quotes or backticks). If
>> the
>> > table function approach is everybody's preferred approach, I think it is
>> > important to have named parameters per Julian's notes.
>> >
>> > @Julian, how hard do you think it will be to add named parameters?
>>
>> I just checked in a fix for
>> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Thanks Julian,
I have looked into using Table Functions in Drill. I had to make some
modifications to the planner so that the function lookup in the Storage
plugin works. I will submit a patch for that.

I had a few questions:
 *1)* For this particular use case it seems that we could use TableMacro as
all the logic can be happening in the planner. Should I look into that?
   - Drill Schema returns a DrillTable (which implements Table).
   - A TableMacro returns a TranslatableTable
   - It is not clear to me what a TableFunction returns as it defines only
methods that return types.
 Ideally I'd like to produce a DrillTable like getTable in Schema, the only
difference with getTable is that we use the function parameters when
producing a table.
For reference: Drill getTable there:
https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
It indirectly calls:
https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317

 *2)* The getFunctions method in Schema does not seem to be aware at all of
the context it is called in. I would want to return different functions
depending on where we are in the query (table functions in the from clause,
regular functions in where). Is there a way to know if we are in the
context of a FROM or a WHERE clause?

 *3)* is the table(...) wrapping syntax necessary?
Note:
  - In Drill back ticks are use for identifiers containing dot or slash.
like the path to the file as a table name: dfs.`/path/to/file.ext`
  - single quotes are used to delimit strings: 'my string passed as a
parameter'

  The current syntax is something like:
*     select * from table(dfs.delimitedFile(path => '/path/to/file',
delimiter => '|'))*
*     select * from table(dfs.`**/path/to/file`**(type => 'text', delimiter
=> '|'))*
*     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
It seems that table(...) is redundant since we are in the from clause.
 It could simply be:
*     select * from dfs.delimitedFile(path => '/path/to/file', delimiter =>
'|')*
*     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
'|')*
*     select * from dfs.`**/path/to/file`**(type => 'json')*

 *4)* Can a table be a parameter? If yes, how do we declare a table
parameter? (not the backticks instead of single quotes)
*     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
delimiter => '|')*

Thank you!
Julien


On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:

> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> > Agreed. We need both select with option and .drill (by etl process or by
> > sql ascribe metadata).
> >
> > Let's start with the select with options. My only goal would be to make
> > sure that creation of .drill file through SQL uses a similar pattern to
> the
> > select with options. It is also important that tables names are still
> > expressed as identifiers instead of strings (people already have enough
> > trouble with remembering whether to use single quotes or backticks). If
> the
> > table function approach is everybody's preferred approach, I think it is
> > important to have named parameters per Julian's notes.
> >
> > @Julian, how hard do you think it will be to add named parameters?
>
> I just checked in a fix for
> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>



-- 
Julien

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

Thanks Julian,
I have looked into using Table Functions in Drill. I had to make some
modifications to the planner so that the function lookup in the Storage
plugin works. I will submit a patch for that.

I had a few questions:
 *1)* For this particular use case it seems that we could use TableMacro as
all the logic can be happening in the planner. Should I look into that?
   - Drill Schema returns a DrillTable (which implements Table).
   - A TableMacro returns a TranslatableTable
   - It is not clear to me what a TableFunction returns as it defines only
methods that return types.
 Ideally I'd like to produce a DrillTable like getTable in Schema, the only
difference with getTable is that we use the function parameters when
producing a table.
For reference: Drill getTable there:
https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L235
It indirectly calls:
https://github.com/apache/drill/blob/bb69f2202ed6115b39bd8681e59c6ff6091e9b9e/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java#L317

 *2)* The getFunctions method in Schema does not seem to be aware at all of
the context it is called in. I would want to return different functions
depending on where we are in the query (table functions in the from clause,
regular functions in where). Is there a way to know if we are in the
context of a FROM or a WHERE clause?

 *3)* is the table(...) wrapping syntax necessary?
Note:
  - In Drill back ticks are use for identifiers containing dot or slash.
like the path to the file as a table name: dfs.`/path/to/file.ext`
  - single quotes are used to delimit strings: 'my string passed as a
parameter'

  The current syntax is something like:
*     select * from table(dfs.delimitedFile(path => '/path/to/file',
delimiter => '|'))*
*     select * from table(dfs.`**/path/to/file`**(type => 'text', delimiter
=> '|'))*
*     select * from table(dfs.`**/path/to/file`**(type => 'json'))*
It seems that table(...) is redundant since we are in the from clause.
 It could simply be:
*     select * from dfs.delimitedFile(path => '/path/to/file', delimiter =>
'|')*
*     select * from dfs.`**/path/to/file`**(type => 'text', delimiter =>
'|')*
*     select * from dfs.`**/path/to/file`**(type => 'json')*

 *4)* Can a table be a parameter? If yes, how do we declare a table
parameter? (not the backticks instead of single quotes)
*     select * from dfs.delimitedFile(table => dfs.`/path/to/file`,
delimiter => '|')*

Thank you!
Julien


On Sun, Nov 1, 2015 at 8:54 AM, Julian Hyde <jh...@apache.org> wrote:

> On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> > Agreed. We need both select with option and .drill (by etl process or by
> > sql ascribe metadata).
> >
> > Let's start with the select with options. My only goal would be to make
> > sure that creation of .drill file through SQL uses a similar pattern to
> the
> > select with options. It is also important that tables names are still
> > expressed as identifiers instead of strings (people already have enough
> > trouble with remembering whether to use single quotes or backticks). If
> the
> > table function approach is everybody's preferred approach, I think it is
> > important to have named parameters per Julian's notes.
> >
> > @Julian, how hard do you think it will be to add named parameters?
>
> I just checked in a fix for
> https://issues.apache.org/jira/browse/CALCITE-941. Check it out.
>



-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

On Sun, Oct 25, 2015 at 10:13 PM, Jacques Nadeau <ja...@dremio.com> wrote:
> Agreed. We need both select with option and .drill (by etl process or by
> sql ascribe metadata).
>
> Let's start with the select with options. My only goal would be to make
> sure that creation of .drill file through SQL uses a similar pattern to the
> select with options. It is also important that tables names are still
> expressed as identifiers instead of strings (people already have enough
> trouble with remembering whether to use single quotes or backticks). If the
> table function approach is everybody's preferred approach, I think it is
> important to have named parameters per Julian's notes.
>
> @Julian, how hard do you think it will be to add named parameters?

I just checked in a fix for
https://issues.apache.org/jira/browse/CALCITE-941. Check it out.

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

Agreed. We need both select with option and .drill (by etl process or by
sql ascribe metadata).

Let's start with the select with options. My only goal would be to make
sure that creation of .drill file through SQL uses a similar pattern to the
select with options. It is also important that tables names are still
expressed as identifiers instead of strings (people already have enough
trouble with remembering whether to use single quotes or backticks). If the
table function approach is everybody's preferred approach, I think it is
important to have named parameters per Julian's notes.

@Julian, how hard do you think it will be to add named parameters?

@Julien, do you want to come up with a more formal proposal?



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Oct 21, 2015 at 5:21 PM, Julien Le Dem <ju...@dremio.com> wrote:

> TL;DR: I agree that there is some overlap, but I meant to say that "Select
> with Option" is needed even if we have a .drill feature.
>
> I'm assuming the .drill file has to be collocated with the data being
> analyzed.
> For example in HDFS, if I have /logs/foo/2015/10/21/my.log i add a .drill
> file in  /logs/foo/2015/10/21/.drill and drill will lookup the parent
> directory for a .drill file (and possibly the parent/parent recursively).
>
> Which is why I was mentioning in my previous email that the analyst often
> has read only access. (the data being produced by ETL or something else)
> Also the user (probably analyst) trying to query the data may not want to:
>  - have to edit a config file
>  - modify configuration that applies to everyone reading the data
>
> So I was thinking of 2 use cases:
>  - try to read the data without having to change config that applies to
> every user (this thread)
>  - set config that configures the system for everyone (.drill file)
>
>
>
>
> On Wed, Oct 21, 2015 at 5:02 PM, Neeraja Rentachintala <
> nrentachintala@maprtech.com> wrote:
>
> > can you elaborate on what you mean by .drill is a different use case.
> >
> > In my mind, .drill has 2 use cases - a way to specify hints to Drill on
> how
> > read  certain datasets (and potentially optimize the queries on the
> > datasets) and a way to save the definitions of objects created via Drill
> > for reuse/access from BI tools. Both these (i.e existing or external
> tables
> > vs Drill created or internal tables) currently are not differentiated in
> > Drill, hence I believe can use the same model in terms of metadata
> > handling.
> >
> > I would be interested in knowing your thoughts.
> > -Neeraja
> >
> > On Wed, Oct 21, 2015 at 4:55 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >
> > > I think of .drill files as a different use case but there is
> potentially
> > > some overlap.
> > > Some things to keep in mind:
> > >  - The person analyzing the data has often read-only access.
> > >  - having to write a config file on one end of the system and then
> query
> > on
> > > the other end is not analyst friendly
> > >
> > > We should definitely keep .drill in mind while design this.
> > > Although I'm thinking we should probably discuss .drill on a separate
> > > thread.
> > >
> > >
> > >
> > > On Wed, Oct 21, 2015 at 3:55 PM, Neeraja Rentachintala <
> > > nrentachintala@maprtech.com> wrote:
> > >
> > > > Another alternative to do this to specify a metadata file (.drill
> > files)
> > > > that came up in some of the earlier discussions to solve similar use
> > > cases.
> > > > Rather than centrally defining configurations in storage plugin
> (which
> > is
> > > > what Drill does today), .drill files allow more granularity ,
> > potentially
> > > > at folder or individual file which will override the central
> > > configuration.
> > > >
> > > > I think the benefit of the metadata file is it can be used for other
> > > > purposes (such as to stats etc). Another benefit is that if you are
> > > using a
> > > > BI/query tool to trigger Drill SQL queries, this will work seamlessly
> > > > rather than having to rewrite the query for custom syntax.
> > > >
> > > > I would like to know what others think of this approach.
> > > >
> > > > -Neeraja
> > > >
> > > >
> > > >
> > > > On Wed, Oct 21, 2015 at 3:43 PM, Julien Le Dem <ju...@dremio.com>
> > > wrote:
> > > >
> > > > > I like the approach of using views to add missing metadata to an
> > > existing
> > > > > raw dataset. The raw datasets stay the same and the view becomes
> the
> > > > > interface to the data. Nothing is mutated and we know how things
> are
> > > > > derived from one another.
> > > > >
> > > > > TLDR: I'm trying to summarize the options bellow and add a few
> > > thoughts:
> > > > > (please comment whether you think upside/downside elements are
> valid)
> > > > > Let's refer to those options with the name in *bold*
> > > > >
> > > > > - *urls style params* (my initial strawman): *select * from
> > > > > dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> > > > > Extensibility is ensured through the storage plugin interpretation
> of
> > > the
> > > > > path.
> > > > > I agree with decomposing the syntax of format vs the data path/url.
> > so
> > > > this
> > > > > would conflict with HTTP query parameters.
> > > > > I will not pursue this one but I think it was a great conversation
> > > > starter!
> > > > >
> > > > > - *table functions*: *select *
> > > > > from delimitedFile(dfs.`default`.`/path/to/file/something.psv`,
> '|')*
> > > > > It sounds like these would have to be defined in the corresponding
> > > > > StoragePlugin as they need intimate knowledge of the underlying
> > > storage.
> > > > > Extensibility would come through the storage plugin defining those?
> > > > > +1 on named parameter vs just positional.
> > > > > The possible downside is a syntax that could be a little foreign to
> > the
> > > > > data analyst.
> > > > >
> > > > > - *fake-o parameters*. *select *
> > > > > from dfs.`default`.`/path/to/file/something.psv` where
> > > > magicFieldDelimiter
> > > > > = '|';*
> > > > > I would be inclined to avoid using filters for something that
> changes
> > > > what
> > > > > the data looks like.
> > > > > This could be unintuitive to users. (at least it feels that way to
> > me)
> > > > > Extensibility would come through the storage plugin defining
> > predicate
> > > > push
> > > > > down rules?
> > > > >
> > > > > - *WITH USING OPTIONS*:
> > > > > In general I would feel more natural to me to put those options as
> > part
> > > > of
> > > > > a select statement. the select statement can always be used in a
> WITH
> > > > > clause.
> > > > > Extensibility would come through the storage plugin receiving those
> > > > > options? As the with statement applies to a full select statement
> > with
> > > > > potentially joins, how would we know where to aply those options?
> > > > > We have two sub-options:
> > > > >   - *(type = 'text' AND lineDelimiter = '\n')*: this seems similar
> to
> > > > > fake-o
> > > > > parameters (above), same comment than above. *AND* seems out of
> > place,
> > > > for
> > > > > example OR would be forbidden.
> > > > >   - *{ type: "text", linedDelimiter = "\n"}*: The advantage of this
> > is
> > > > that
> > > > > you can re-use the same syntax in the configuration file. This is a
> > > plus
> > > > > for consistency. Users would figure out what works and would just
> > have
> > > to
> > > > > put it in the central configuration once done.
> > > > >
> > > > > - *EXTEND WITH OPTIONS*: *SELECT FROM emp EXTEND [optional columns
> > > list]
> > > > > OPTIONS (type 'text', lineDelimiter);*
> > > > > What would be the column list in that case? Would it be awkward to
> > use
> > > > > EXTEND without a column list?
> > > > > Extensibility would come through the storage plugin receiving those
> > > > extend
> > > > > options.
> > > > > It sounds like they could be simply SELECT OPTIONS?
> > > > >
> > > > > - *Specific syntax*: *select * FROM mydb.mytable*
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *    TREAT AS TEXT      USING        LINE DELIMITER '\n'        AND
> > > FIELD
> > > > > DELIMITER ','        AND TREAT FIRST ROW AS HEADERS*
> > > > > Extensibility would come through the storage plugin defining a sub
> > > > grammar
> > > > > for those options.
> > > > > Possibly this is harder to implement for the storage plugin
> > > implementor.
> > > > > Upside is the user has a SQL like syntax to specify this (although,
> > > I've
> > > > > never been fond of parts where SQL is trying to be plain English)
> > > > >
> > > > >
> > > > > *Appendix*: :P
> > > > > For what it's worth, here is how Pig does it: *LOAD 'data' [USING
> > > > function]
> > > > > [AS schema];*
> > > > > - Just *LOAD '/path/to/my/file'* will use the default Loader (tab
> > > > separated
> > > > > values)
> > > > > - adding a custom Loader* LOAD 'data' USING
> MyCustomLoader('param1',
> > > > > 'param2'); *is how you implement custom formats or remote locations
> > > > (HBase,
> > > > > Web service, ...)
> > > > > So you can use the default loader with a different separator
> (*USING
> > > > > PigStorage('\t')*) in parameter or write your own.
> > > > > - The AS clause lets you set the schema. You could have a csv file
> > > > without
> > > > > header and define the names and types of each column that way.
> > > > > Doc: https://pig.apache.org/docs/r0.15.0/basic.html#load
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Oct 21, 2015 at 8:57 AM, Julian Hyde <jh...@apache.org>
> > wrote:
> > > > >
> > > > > > Whatever API is used to scan files from SQL, there will need to
> be
> > a
> > > > > > corresponding way to accomplish the same thing in a user
> interface.
> > > > > > Probably a form with various fields, some of them with drop-boxes
> > > etc.
> > > > > >
> > > > > > And ideally a facility that samples a few hundred rows to deduce
> > the
> > > > > > probable field names and types and which fields are unique.
> > > > > >
> > > > > > I think that the UI is the true "user friendly" interface. A
> usage
> > > > > > pattern might be for people to define a data source using in the
> > UI,
> > > > > > save it as a view, then switch to the command line to write
> queries
> > > on
> > > > > > that view.
> > > > > >
> > > > > > There are other use cases similar to reading flies. For example
> you
> > > > > > would like to read data from an HTTP URL. You might want to
> specify
> > > > > > similar parameters for formats, compression, parsing, and
> > parameters
> > > > > > in the file URI that describe a family of partitioned files. A
> URL
> > > > > > might allow push-down of filters, projects and sorts. But still
> you
> > > > > > would want to specify formats, compression and parsing the same
> way
> > > as
> > > > > > reading files.
> > > > > >
> > > > > > To me, this argues for decomposing the file scan syntax into
> pieces
> > > > > > that can be re-used if you get data from places other than files.
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > >
> > > > > > On Wed, Oct 21, 2015 at 6:15 AM, Jacques Nadeau <
> > jacques@dremio.com>
> > > > > > wrote:
> > > > > > >> This fourth is also least extensible and thus most
> > > disenfranchising
> > > > > for
> > > > > > >> those outside the inner group.
> > > > > > >>
> > > > > > >> Table functions (hopefully) would be something that others
> could
> > > > > > > implement.
> > > > > > >
> > > > > > > This is a brainstorm about a user apis...
> > > > > > >
> > > > > > > It isn't appropriate to shoot down ideas immediately in a
> > > brainstorm.
> > > > > It
> > > > > > > has a chilling effect on other people presenting new ideas.
> > > > > > >
> > > > > > > User apis should be defined first with an end-user in mind.
> > > > Consistency
> > > > > > in
> > > > > > > different contexts is also very important (note my expansion of
> > the
> > > > > > > discussion to ASCRIBE METADATA.)
> > > > > > >
> > > > > > > Your statement about extensibility has no technical backing. If
> > you
> > > > > have
> > > > > > a
> > > > > > > concern like this, ask the proposer if they think that this
> could
> > > be
> > > > > done
> > > > > > > in an extensible way. In this case I see a number of ways that
> > this
> > > > > could
> > > > > > > be done. Note the mention of the grammar switch in my initial
> > > > proposal
> > > > > or
> > > > > > > go review my outstanding patch for integrating JSON literals.
> In
> > > many
> > > > > > ways,
> > > > > > > I think this approach could be considered the most extensible
> and
> > > > > > > expressive for a Drill extension developer.
> > > > > > >
> > > > > > > I hope others will continue to brainstorm on this thread.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Julien
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
>
>
>
> --
> Julien
>

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

TL;DR: I agree that there is some overlap, but I meant to say that "Select
with Option" is needed even if we have a .drill feature.

I'm assuming the .drill file has to be collocated with the data being
analyzed.
For example in HDFS, if I have /logs/foo/2015/10/21/my.log i add a .drill
file in  /logs/foo/2015/10/21/.drill and drill will lookup the parent
directory for a .drill file (and possibly the parent/parent recursively).

Which is why I was mentioning in my previous email that the analyst often
has read only access. (the data being produced by ETL or something else)
Also the user (probably analyst) trying to query the data may not want to:
 - have to edit a config file
 - modify configuration that applies to everyone reading the data

So I was thinking of 2 use cases:
 - try to read the data without having to change config that applies to
every user (this thread)
 - set config that configures the system for everyone (.drill file)




On Wed, Oct 21, 2015 at 5:02 PM, Neeraja Rentachintala <
nrentachintala@maprtech.com> wrote:

> can you elaborate on what you mean by .drill is a different use case.
>
> In my mind, .drill has 2 use cases - a way to specify hints to Drill on how
> read  certain datasets (and potentially optimize the queries on the
> datasets) and a way to save the definitions of objects created via Drill
> for reuse/access from BI tools. Both these (i.e existing or external tables
> vs Drill created or internal tables) currently are not differentiated in
> Drill, hence I believe can use the same model in terms of metadata
> handling.
>
> I would be interested in knowing your thoughts.
> -Neeraja
>
> On Wed, Oct 21, 2015 at 4:55 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > I think of .drill files as a different use case but there is potentially
> > some overlap.
> > Some things to keep in mind:
> >  - The person analyzing the data has often read-only access.
> >  - having to write a config file on one end of the system and then query
> on
> > the other end is not analyst friendly
> >
> > We should definitely keep .drill in mind while design this.
> > Although I'm thinking we should probably discuss .drill on a separate
> > thread.
> >
> >
> >
> > On Wed, Oct 21, 2015 at 3:55 PM, Neeraja Rentachintala <
> > nrentachintala@maprtech.com> wrote:
> >
> > > Another alternative to do this to specify a metadata file (.drill
> files)
> > > that came up in some of the earlier discussions to solve similar use
> > cases.
> > > Rather than centrally defining configurations in storage plugin (which
> is
> > > what Drill does today), .drill files allow more granularity ,
> potentially
> > > at folder or individual file which will override the central
> > configuration.
> > >
> > > I think the benefit of the metadata file is it can be used for other
> > > purposes (such as to stats etc). Another benefit is that if you are
> > using a
> > > BI/query tool to trigger Drill SQL queries, this will work seamlessly
> > > rather than having to rewrite the query for custom syntax.
> > >
> > > I would like to know what others think of this approach.
> > >
> > > -Neeraja
> > >
> > >
> > >
> > > On Wed, Oct 21, 2015 at 3:43 PM, Julien Le Dem <ju...@dremio.com>
> > wrote:
> > >
> > > > I like the approach of using views to add missing metadata to an
> > existing
> > > > raw dataset. The raw datasets stay the same and the view becomes the
> > > > interface to the data. Nothing is mutated and we know how things are
> > > > derived from one another.
> > > >
> > > > TLDR: I'm trying to summarize the options bellow and add a few
> > thoughts:
> > > > (please comment whether you think upside/downside elements are valid)
> > > > Let's refer to those options with the name in *bold*
> > > >
> > > > - *urls style params* (my initial strawman): *select * from
> > > > dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> > > > Extensibility is ensured through the storage plugin interpretation of
> > the
> > > > path.
> > > > I agree with decomposing the syntax of format vs the data path/url.
> so
> > > this
> > > > would conflict with HTTP query parameters.
> > > > I will not pursue this one but I think it was a great conversation
> > > starter!
> > > >
> > > > - *table functions*: *select *
> > > > from delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')*
> > > > It sounds like these would have to be defined in the corresponding
> > > > StoragePlugin as they need intimate knowledge of the underlying
> > storage.
> > > > Extensibility would come through the storage plugin defining those?
> > > > +1 on named parameter vs just positional.
> > > > The possible downside is a syntax that could be a little foreign to
> the
> > > > data analyst.
> > > >
> > > > - *fake-o parameters*. *select *
> > > > from dfs.`default`.`/path/to/file/something.psv` where
> > > magicFieldDelimiter
> > > > = '|';*
> > > > I would be inclined to avoid using filters for something that changes
> > > what
> > > > the data looks like.
> > > > This could be unintuitive to users. (at least it feels that way to
> me)
> > > > Extensibility would come through the storage plugin defining
> predicate
> > > push
> > > > down rules?
> > > >
> > > > - *WITH USING OPTIONS*:
> > > > In general I would feel more natural to me to put those options as
> part
> > > of
> > > > a select statement. the select statement can always be used in a WITH
> > > > clause.
> > > > Extensibility would come through the storage plugin receiving those
> > > > options? As the with statement applies to a full select statement
> with
> > > > potentially joins, how would we know where to aply those options?
> > > > We have two sub-options:
> > > >   - *(type = 'text' AND lineDelimiter = '\n')*: this seems similar to
> > > > fake-o
> > > > parameters (above), same comment than above. *AND* seems out of
> place,
> > > for
> > > > example OR would be forbidden.
> > > >   - *{ type: "text", linedDelimiter = "\n"}*: The advantage of this
> is
> > > that
> > > > you can re-use the same syntax in the configuration file. This is a
> > plus
> > > > for consistency. Users would figure out what works and would just
> have
> > to
> > > > put it in the central configuration once done.
> > > >
> > > > - *EXTEND WITH OPTIONS*: *SELECT FROM emp EXTEND [optional columns
> > list]
> > > > OPTIONS (type 'text', lineDelimiter);*
> > > > What would be the column list in that case? Would it be awkward to
> use
> > > > EXTEND without a column list?
> > > > Extensibility would come through the storage plugin receiving those
> > > extend
> > > > options.
> > > > It sounds like they could be simply SELECT OPTIONS?
> > > >
> > > > - *Specific syntax*: *select * FROM mydb.mytable*
> > > >
> > > >
> > > >
> > > >
> > > > *    TREAT AS TEXT      USING        LINE DELIMITER '\n'        AND
> > FIELD
> > > > DELIMITER ','        AND TREAT FIRST ROW AS HEADERS*
> > > > Extensibility would come through the storage plugin defining a sub
> > > grammar
> > > > for those options.
> > > > Possibly this is harder to implement for the storage plugin
> > implementor.
> > > > Upside is the user has a SQL like syntax to specify this (although,
> > I've
> > > > never been fond of parts where SQL is trying to be plain English)
> > > >
> > > >
> > > > *Appendix*: :P
> > > > For what it's worth, here is how Pig does it: *LOAD 'data' [USING
> > > function]
> > > > [AS schema];*
> > > > - Just *LOAD '/path/to/my/file'* will use the default Loader (tab
> > > separated
> > > > values)
> > > > - adding a custom Loader* LOAD 'data' USING MyCustomLoader('param1',
> > > > 'param2'); *is how you implement custom formats or remote locations
> > > (HBase,
> > > > Web service, ...)
> > > > So you can use the default loader with a different separator (*USING
> > > > PigStorage('\t')*) in parameter or write your own.
> > > > - The AS clause lets you set the schema. You could have a csv file
> > > without
> > > > header and define the names and types of each column that way.
> > > > Doc: https://pig.apache.org/docs/r0.15.0/basic.html#load
> > > >
> > > >
> > > >
> > > > On Wed, Oct 21, 2015 at 8:57 AM, Julian Hyde <jh...@apache.org>
> wrote:
> > > >
> > > > > Whatever API is used to scan files from SQL, there will need to be
> a
> > > > > corresponding way to accomplish the same thing in a user interface.
> > > > > Probably a form with various fields, some of them with drop-boxes
> > etc.
> > > > >
> > > > > And ideally a facility that samples a few hundred rows to deduce
> the
> > > > > probable field names and types and which fields are unique.
> > > > >
> > > > > I think that the UI is the true "user friendly" interface. A usage
> > > > > pattern might be for people to define a data source using in the
> UI,
> > > > > save it as a view, then switch to the command line to write queries
> > on
> > > > > that view.
> > > > >
> > > > > There are other use cases similar to reading flies. For example you
> > > > > would like to read data from an HTTP URL. You might want to specify
> > > > > similar parameters for formats, compression, parsing, and
> parameters
> > > > > in the file URI that describe a family of partitioned files. A URL
> > > > > might allow push-down of filters, projects and sorts. But still you
> > > > > would want to specify formats, compression and parsing the same way
> > as
> > > > > reading files.
> > > > >
> > > > > To me, this argues for decomposing the file scan syntax into pieces
> > > > > that can be re-used if you get data from places other than files.
> > > > >
> > > > > Julian
> > > > >
> > > > >
> > > > > On Wed, Oct 21, 2015 at 6:15 AM, Jacques Nadeau <
> jacques@dremio.com>
> > > > > wrote:
> > > > > >> This fourth is also least extensible and thus most
> > disenfranchising
> > > > for
> > > > > >> those outside the inner group.
> > > > > >>
> > > > > >> Table functions (hopefully) would be something that others could
> > > > > > implement.
> > > > > >
> > > > > > This is a brainstorm about a user apis...
> > > > > >
> > > > > > It isn't appropriate to shoot down ideas immediately in a
> > brainstorm.
> > > > It
> > > > > > has a chilling effect on other people presenting new ideas.
> > > > > >
> > > > > > User apis should be defined first with an end-user in mind.
> > > Consistency
> > > > > in
> > > > > > different contexts is also very important (note my expansion of
> the
> > > > > > discussion to ASCRIBE METADATA.)
> > > > > >
> > > > > > Your statement about extensibility has no technical backing. If
> you
> > > > have
> > > > > a
> > > > > > concern like this, ask the proposer if they think that this could
> > be
> > > > done
> > > > > > in an extensible way. In this case I see a number of ways that
> this
> > > > could
> > > > > > be done. Note the mention of the grammar switch in my initial
> > > proposal
> > > > or
> > > > > > go review my outstanding patch for integrating JSON literals. In
> > many
> > > > > ways,
> > > > > > I think this approach could be considered the most extensible and
> > > > > > expressive for a Drill extension developer.
> > > > > >
> > > > > > I hope others will continue to brainstorm on this thread.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Julien
> > > >
> > >
> >
> >
> >
> > --
> > Julien
> >
>



-- 
Julien

Re: select from table with options

Posted by Neeraja Rentachintala <nr...@maprtech.com>.

can you elaborate on what you mean by .drill is a different use case.

In my mind, .drill has 2 use cases - a way to specify hints to Drill on how
read  certain datasets (and potentially optimize the queries on the
datasets) and a way to save the definitions of objects created via Drill
for reuse/access from BI tools. Both these (i.e existing or external tables
vs Drill created or internal tables) currently are not differentiated in
Drill, hence I believe can use the same model in terms of metadata
handling.

I would be interested in knowing your thoughts.
-Neeraja

On Wed, Oct 21, 2015 at 4:55 PM, Julien Le Dem <ju...@dremio.com> wrote:

> I think of .drill files as a different use case but there is potentially
> some overlap.
> Some things to keep in mind:
>  - The person analyzing the data has often read-only access.
>  - having to write a config file on one end of the system and then query on
> the other end is not analyst friendly
>
> We should definitely keep .drill in mind while design this.
> Although I'm thinking we should probably discuss .drill on a separate
> thread.
>
>
>
> On Wed, Oct 21, 2015 at 3:55 PM, Neeraja Rentachintala <
> nrentachintala@maprtech.com> wrote:
>
> > Another alternative to do this to specify a metadata file (.drill files)
> > that came up in some of the earlier discussions to solve similar use
> cases.
> > Rather than centrally defining configurations in storage plugin (which is
> > what Drill does today), .drill files allow more granularity , potentially
> > at folder or individual file which will override the central
> configuration.
> >
> > I think the benefit of the metadata file is it can be used for other
> > purposes (such as to stats etc). Another benefit is that if you are
> using a
> > BI/query tool to trigger Drill SQL queries, this will work seamlessly
> > rather than having to rewrite the query for custom syntax.
> >
> > I would like to know what others think of this approach.
> >
> > -Neeraja
> >
> >
> >
> > On Wed, Oct 21, 2015 at 3:43 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >
> > > I like the approach of using views to add missing metadata to an
> existing
> > > raw dataset. The raw datasets stay the same and the view becomes the
> > > interface to the data. Nothing is mutated and we know how things are
> > > derived from one another.
> > >
> > > TLDR: I'm trying to summarize the options bellow and add a few
> thoughts:
> > > (please comment whether you think upside/downside elements are valid)
> > > Let's refer to those options with the name in *bold*
> > >
> > > - *urls style params* (my initial strawman): *select * from
> > > dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> > > Extensibility is ensured through the storage plugin interpretation of
> the
> > > path.
> > > I agree with decomposing the syntax of format vs the data path/url. so
> > this
> > > would conflict with HTTP query parameters.
> > > I will not pursue this one but I think it was a great conversation
> > starter!
> > >
> > > - *table functions*: *select *
> > > from delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')*
> > > It sounds like these would have to be defined in the corresponding
> > > StoragePlugin as they need intimate knowledge of the underlying
> storage.
> > > Extensibility would come through the storage plugin defining those?
> > > +1 on named parameter vs just positional.
> > > The possible downside is a syntax that could be a little foreign to the
> > > data analyst.
> > >
> > > - *fake-o parameters*. *select *
> > > from dfs.`default`.`/path/to/file/something.psv` where
> > magicFieldDelimiter
> > > = '|';*
> > > I would be inclined to avoid using filters for something that changes
> > what
> > > the data looks like.
> > > This could be unintuitive to users. (at least it feels that way to me)
> > > Extensibility would come through the storage plugin defining predicate
> > push
> > > down rules?
> > >
> > > - *WITH USING OPTIONS*:
> > > In general I would feel more natural to me to put those options as part
> > of
> > > a select statement. the select statement can always be used in a WITH
> > > clause.
> > > Extensibility would come through the storage plugin receiving those
> > > options? As the with statement applies to a full select statement with
> > > potentially joins, how would we know where to aply those options?
> > > We have two sub-options:
> > >   - *(type = 'text' AND lineDelimiter = '\n')*: this seems similar to
> > > fake-o
> > > parameters (above), same comment than above. *AND* seems out of place,
> > for
> > > example OR would be forbidden.
> > >   - *{ type: "text", linedDelimiter = "\n"}*: The advantage of this is
> > that
> > > you can re-use the same syntax in the configuration file. This is a
> plus
> > > for consistency. Users would figure out what works and would just have
> to
> > > put it in the central configuration once done.
> > >
> > > - *EXTEND WITH OPTIONS*: *SELECT FROM emp EXTEND [optional columns
> list]
> > > OPTIONS (type 'text', lineDelimiter);*
> > > What would be the column list in that case? Would it be awkward to use
> > > EXTEND without a column list?
> > > Extensibility would come through the storage plugin receiving those
> > extend
> > > options.
> > > It sounds like they could be simply SELECT OPTIONS?
> > >
> > > - *Specific syntax*: *select * FROM mydb.mytable*
> > >
> > >
> > >
> > >
> > > *    TREAT AS TEXT      USING        LINE DELIMITER '\n'        AND
> FIELD
> > > DELIMITER ','        AND TREAT FIRST ROW AS HEADERS*
> > > Extensibility would come through the storage plugin defining a sub
> > grammar
> > > for those options.
> > > Possibly this is harder to implement for the storage plugin
> implementor.
> > > Upside is the user has a SQL like syntax to specify this (although,
> I've
> > > never been fond of parts where SQL is trying to be plain English)
> > >
> > >
> > > *Appendix*: :P
> > > For what it's worth, here is how Pig does it: *LOAD 'data' [USING
> > function]
> > > [AS schema];*
> > > - Just *LOAD '/path/to/my/file'* will use the default Loader (tab
> > separated
> > > values)
> > > - adding a custom Loader* LOAD 'data' USING MyCustomLoader('param1',
> > > 'param2'); *is how you implement custom formats or remote locations
> > (HBase,
> > > Web service, ...)
> > > So you can use the default loader with a different separator (*USING
> > > PigStorage('\t')*) in parameter or write your own.
> > > - The AS clause lets you set the schema. You could have a csv file
> > without
> > > header and define the names and types of each column that way.
> > > Doc: https://pig.apache.org/docs/r0.15.0/basic.html#load
> > >
> > >
> > >
> > > On Wed, Oct 21, 2015 at 8:57 AM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > > > Whatever API is used to scan files from SQL, there will need to be a
> > > > corresponding way to accomplish the same thing in a user interface.
> > > > Probably a form with various fields, some of them with drop-boxes
> etc.
> > > >
> > > > And ideally a facility that samples a few hundred rows to deduce the
> > > > probable field names and types and which fields are unique.
> > > >
> > > > I think that the UI is the true "user friendly" interface. A usage
> > > > pattern might be for people to define a data source using in the UI,
> > > > save it as a view, then switch to the command line to write queries
> on
> > > > that view.
> > > >
> > > > There are other use cases similar to reading flies. For example you
> > > > would like to read data from an HTTP URL. You might want to specify
> > > > similar parameters for formats, compression, parsing, and parameters
> > > > in the file URI that describe a family of partitioned files. A URL
> > > > might allow push-down of filters, projects and sorts. But still you
> > > > would want to specify formats, compression and parsing the same way
> as
> > > > reading files.
> > > >
> > > > To me, this argues for decomposing the file scan syntax into pieces
> > > > that can be re-used if you get data from places other than files.
> > > >
> > > > Julian
> > > >
> > > >
> > > > On Wed, Oct 21, 2015 at 6:15 AM, Jacques Nadeau <ja...@dremio.com>
> > > > wrote:
> > > > >> This fourth is also least extensible and thus most
> disenfranchising
> > > for
> > > > >> those outside the inner group.
> > > > >>
> > > > >> Table functions (hopefully) would be something that others could
> > > > > implement.
> > > > >
> > > > > This is a brainstorm about a user apis...
> > > > >
> > > > > It isn't appropriate to shoot down ideas immediately in a
> brainstorm.
> > > It
> > > > > has a chilling effect on other people presenting new ideas.
> > > > >
> > > > > User apis should be defined first with an end-user in mind.
> > Consistency
> > > > in
> > > > > different contexts is also very important (note my expansion of the
> > > > > discussion to ASCRIBE METADATA.)
> > > > >
> > > > > Your statement about extensibility has no technical backing. If you
> > > have
> > > > a
> > > > > concern like this, ask the proposer if they think that this could
> be
> > > done
> > > > > in an extensible way. In this case I see a number of ways that this
> > > could
> > > > > be done. Note the mention of the grammar switch in my initial
> > proposal
> > > or
> > > > > go review my outstanding patch for integrating JSON literals. In
> many
> > > > ways,
> > > > > I think this approach could be considered the most extensible and
> > > > > expressive for a Drill extension developer.
> > > > >
> > > > > I hope others will continue to brainstorm on this thread.
> > > >
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
>
>
>
> --
> Julien
>

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

I think of .drill files as a different use case but there is potentially
some overlap.
Some things to keep in mind:
 - The person analyzing the data has often read-only access.
 - having to write a config file on one end of the system and then query on
the other end is not analyst friendly

We should definitely keep .drill in mind while design this.
Although I'm thinking we should probably discuss .drill on a separate
thread.



On Wed, Oct 21, 2015 at 3:55 PM, Neeraja Rentachintala <
nrentachintala@maprtech.com> wrote:

> Another alternative to do this to specify a metadata file (.drill files)
> that came up in some of the earlier discussions to solve similar use cases.
> Rather than centrally defining configurations in storage plugin (which is
> what Drill does today), .drill files allow more granularity , potentially
> at folder or individual file which will override the central configuration.
>
> I think the benefit of the metadata file is it can be used for other
> purposes (such as to stats etc). Another benefit is that if you are using a
> BI/query tool to trigger Drill SQL queries, this will work seamlessly
> rather than having to rewrite the query for custom syntax.
>
> I would like to know what others think of this approach.
>
> -Neeraja
>
>
>
> On Wed, Oct 21, 2015 at 3:43 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > I like the approach of using views to add missing metadata to an existing
> > raw dataset. The raw datasets stay the same and the view becomes the
> > interface to the data. Nothing is mutated and we know how things are
> > derived from one another.
> >
> > TLDR: I'm trying to summarize the options bellow and add a few thoughts:
> > (please comment whether you think upside/downside elements are valid)
> > Let's refer to those options with the name in *bold*
> >
> > - *urls style params* (my initial strawman): *select * from
> > dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> > Extensibility is ensured through the storage plugin interpretation of the
> > path.
> > I agree with decomposing the syntax of format vs the data path/url. so
> this
> > would conflict with HTTP query parameters.
> > I will not pursue this one but I think it was a great conversation
> starter!
> >
> > - *table functions*: *select *
> > from delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')*
> > It sounds like these would have to be defined in the corresponding
> > StoragePlugin as they need intimate knowledge of the underlying storage.
> > Extensibility would come through the storage plugin defining those?
> > +1 on named parameter vs just positional.
> > The possible downside is a syntax that could be a little foreign to the
> > data analyst.
> >
> > - *fake-o parameters*. *select *
> > from dfs.`default`.`/path/to/file/something.psv` where
> magicFieldDelimiter
> > = '|';*
> > I would be inclined to avoid using filters for something that changes
> what
> > the data looks like.
> > This could be unintuitive to users. (at least it feels that way to me)
> > Extensibility would come through the storage plugin defining predicate
> push
> > down rules?
> >
> > - *WITH USING OPTIONS*:
> > In general I would feel more natural to me to put those options as part
> of
> > a select statement. the select statement can always be used in a WITH
> > clause.
> > Extensibility would come through the storage plugin receiving those
> > options? As the with statement applies to a full select statement with
> > potentially joins, how would we know where to aply those options?
> > We have two sub-options:
> >   - *(type = 'text' AND lineDelimiter = '\n')*: this seems similar to
> > fake-o
> > parameters (above), same comment than above. *AND* seems out of place,
> for
> > example OR would be forbidden.
> >   - *{ type: "text", linedDelimiter = "\n"}*: The advantage of this is
> that
> > you can re-use the same syntax in the configuration file. This is a plus
> > for consistency. Users would figure out what works and would just have to
> > put it in the central configuration once done.
> >
> > - *EXTEND WITH OPTIONS*: *SELECT FROM emp EXTEND [optional columns list]
> > OPTIONS (type 'text', lineDelimiter);*
> > What would be the column list in that case? Would it be awkward to use
> > EXTEND without a column list?
> > Extensibility would come through the storage plugin receiving those
> extend
> > options.
> > It sounds like they could be simply SELECT OPTIONS?
> >
> > - *Specific syntax*: *select * FROM mydb.mytable*
> >
> >
> >
> >
> > *    TREAT AS TEXT      USING        LINE DELIMITER '\n'        AND FIELD
> > DELIMITER ','        AND TREAT FIRST ROW AS HEADERS*
> > Extensibility would come through the storage plugin defining a sub
> grammar
> > for those options.
> > Possibly this is harder to implement for the storage plugin implementor.
> > Upside is the user has a SQL like syntax to specify this (although, I've
> > never been fond of parts where SQL is trying to be plain English)
> >
> >
> > *Appendix*: :P
> > For what it's worth, here is how Pig does it: *LOAD 'data' [USING
> function]
> > [AS schema];*
> > - Just *LOAD '/path/to/my/file'* will use the default Loader (tab
> separated
> > values)
> > - adding a custom Loader* LOAD 'data' USING MyCustomLoader('param1',
> > 'param2'); *is how you implement custom formats or remote locations
> (HBase,
> > Web service, ...)
> > So you can use the default loader with a different separator (*USING
> > PigStorage('\t')*) in parameter or write your own.
> > - The AS clause lets you set the schema. You could have a csv file
> without
> > header and define the names and types of each column that way.
> > Doc: https://pig.apache.org/docs/r0.15.0/basic.html#load
> >
> >
> >
> > On Wed, Oct 21, 2015 at 8:57 AM, Julian Hyde <jh...@apache.org> wrote:
> >
> > > Whatever API is used to scan files from SQL, there will need to be a
> > > corresponding way to accomplish the same thing in a user interface.
> > > Probably a form with various fields, some of them with drop-boxes etc.
> > >
> > > And ideally a facility that samples a few hundred rows to deduce the
> > > probable field names and types and which fields are unique.
> > >
> > > I think that the UI is the true "user friendly" interface. A usage
> > > pattern might be for people to define a data source using in the UI,
> > > save it as a view, then switch to the command line to write queries on
> > > that view.
> > >
> > > There are other use cases similar to reading flies. For example you
> > > would like to read data from an HTTP URL. You might want to specify
> > > similar parameters for formats, compression, parsing, and parameters
> > > in the file URI that describe a family of partitioned files. A URL
> > > might allow push-down of filters, projects and sorts. But still you
> > > would want to specify formats, compression and parsing the same way as
> > > reading files.
> > >
> > > To me, this argues for decomposing the file scan syntax into pieces
> > > that can be re-used if you get data from places other than files.
> > >
> > > Julian
> > >
> > >
> > > On Wed, Oct 21, 2015 at 6:15 AM, Jacques Nadeau <ja...@dremio.com>
> > > wrote:
> > > >> This fourth is also least extensible and thus most disenfranchising
> > for
> > > >> those outside the inner group.
> > > >>
> > > >> Table functions (hopefully) would be something that others could
> > > > implement.
> > > >
> > > > This is a brainstorm about a user apis...
> > > >
> > > > It isn't appropriate to shoot down ideas immediately in a brainstorm.
> > It
> > > > has a chilling effect on other people presenting new ideas.
> > > >
> > > > User apis should be defined first with an end-user in mind.
> Consistency
> > > in
> > > > different contexts is also very important (note my expansion of the
> > > > discussion to ASCRIBE METADATA.)
> > > >
> > > > Your statement about extensibility has no technical backing. If you
> > have
> > > a
> > > > concern like this, ask the proposer if they think that this could be
> > done
> > > > in an extensible way. In this case I see a number of ways that this
> > could
> > > > be done. Note the mention of the grammar switch in my initial
> proposal
> > or
> > > > go review my outstanding patch for integrating JSON literals. In many
> > > ways,
> > > > I think this approach could be considered the most extensible and
> > > > expressive for a Drill extension developer.
> > > >
> > > > I hope others will continue to brainstorm on this thread.
> > >
> >
> >
> >
> > --
> > Julien
> >
>



-- 
Julien

Re: select from table with options

Posted by Neeraja Rentachintala <nr...@maprtech.com>.

Another alternative to do this to specify a metadata file (.drill files)
that came up in some of the earlier discussions to solve similar use cases.
Rather than centrally defining configurations in storage plugin (which is
what Drill does today), .drill files allow more granularity , potentially
at folder or individual file which will override the central configuration.

I think the benefit of the metadata file is it can be used for other
purposes (such as to stats etc). Another benefit is that if you are using a
BI/query tool to trigger Drill SQL queries, this will work seamlessly
rather than having to rewrite the query for custom syntax.

I would like to know what others think of this approach.

-Neeraja



On Wed, Oct 21, 2015 at 3:43 PM, Julien Le Dem <ju...@dremio.com> wrote:

> I like the approach of using views to add missing metadata to an existing
> raw dataset. The raw datasets stay the same and the view becomes the
> interface to the data. Nothing is mutated and we know how things are
> derived from one another.
>
> TLDR: I'm trying to summarize the options bellow and add a few thoughts:
> (please comment whether you think upside/downside elements are valid)
> Let's refer to those options with the name in *bold*
>
> - *urls style params* (my initial strawman): *select * from
> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> Extensibility is ensured through the storage plugin interpretation of the
> path.
> I agree with decomposing the syntax of format vs the data path/url. so this
> would conflict with HTTP query parameters.
> I will not pursue this one but I think it was a great conversation starter!
>
> - *table functions*: *select *
> from delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')*
> It sounds like these would have to be defined in the corresponding
> StoragePlugin as they need intimate knowledge of the underlying storage.
> Extensibility would come through the storage plugin defining those?
> +1 on named parameter vs just positional.
> The possible downside is a syntax that could be a little foreign to the
> data analyst.
>
> - *fake-o parameters*. *select *
> from dfs.`default`.`/path/to/file/something.psv` where magicFieldDelimiter
> = '|';*
> I would be inclined to avoid using filters for something that changes what
> the data looks like.
> This could be unintuitive to users. (at least it feels that way to me)
> Extensibility would come through the storage plugin defining predicate push
> down rules?
>
> - *WITH USING OPTIONS*:
> In general I would feel more natural to me to put those options as part of
> a select statement. the select statement can always be used in a WITH
> clause.
> Extensibility would come through the storage plugin receiving those
> options? As the with statement applies to a full select statement with
> potentially joins, how would we know where to aply those options?
> We have two sub-options:
>   - *(type = 'text' AND lineDelimiter = '\n')*: this seems similar to
> fake-o
> parameters (above), same comment than above. *AND* seems out of place, for
> example OR would be forbidden.
>   - *{ type: "text", linedDelimiter = "\n"}*: The advantage of this is that
> you can re-use the same syntax in the configuration file. This is a plus
> for consistency. Users would figure out what works and would just have to
> put it in the central configuration once done.
>
> - *EXTEND WITH OPTIONS*: *SELECT FROM emp EXTEND [optional columns list]
> OPTIONS (type 'text', lineDelimiter);*
> What would be the column list in that case? Would it be awkward to use
> EXTEND without a column list?
> Extensibility would come through the storage plugin receiving those extend
> options.
> It sounds like they could be simply SELECT OPTIONS?
>
> - *Specific syntax*: *select * FROM mydb.mytable*
>
>
>
>
> *    TREAT AS TEXT      USING        LINE DELIMITER '\n'        AND FIELD
> DELIMITER ','        AND TREAT FIRST ROW AS HEADERS*
> Extensibility would come through the storage plugin defining a sub grammar
> for those options.
> Possibly this is harder to implement for the storage plugin implementor.
> Upside is the user has a SQL like syntax to specify this (although, I've
> never been fond of parts where SQL is trying to be plain English)
>
>
> *Appendix*: :P
> For what it's worth, here is how Pig does it: *LOAD 'data' [USING function]
> [AS schema];*
> - Just *LOAD '/path/to/my/file'* will use the default Loader (tab separated
> values)
> - adding a custom Loader* LOAD 'data' USING MyCustomLoader('param1',
> 'param2'); *is how you implement custom formats or remote locations (HBase,
> Web service, ...)
> So you can use the default loader with a different separator (*USING
> PigStorage('\t')*) in parameter or write your own.
> - The AS clause lets you set the schema. You could have a csv file without
> header and define the names and types of each column that way.
> Doc: https://pig.apache.org/docs/r0.15.0/basic.html#load
>
>
>
> On Wed, Oct 21, 2015 at 8:57 AM, Julian Hyde <jh...@apache.org> wrote:
>
> > Whatever API is used to scan files from SQL, there will need to be a
> > corresponding way to accomplish the same thing in a user interface.
> > Probably a form with various fields, some of them with drop-boxes etc.
> >
> > And ideally a facility that samples a few hundred rows to deduce the
> > probable field names and types and which fields are unique.
> >
> > I think that the UI is the true "user friendly" interface. A usage
> > pattern might be for people to define a data source using in the UI,
> > save it as a view, then switch to the command line to write queries on
> > that view.
> >
> > There are other use cases similar to reading flies. For example you
> > would like to read data from an HTTP URL. You might want to specify
> > similar parameters for formats, compression, parsing, and parameters
> > in the file URI that describe a family of partitioned files. A URL
> > might allow push-down of filters, projects and sorts. But still you
> > would want to specify formats, compression and parsing the same way as
> > reading files.
> >
> > To me, this argues for decomposing the file scan syntax into pieces
> > that can be re-used if you get data from places other than files.
> >
> > Julian
> >
> >
> > On Wed, Oct 21, 2015 at 6:15 AM, Jacques Nadeau <ja...@dremio.com>
> > wrote:
> > >> This fourth is also least extensible and thus most disenfranchising
> for
> > >> those outside the inner group.
> > >>
> > >> Table functions (hopefully) would be something that others could
> > > implement.
> > >
> > > This is a brainstorm about a user apis...
> > >
> > > It isn't appropriate to shoot down ideas immediately in a brainstorm.
> It
> > > has a chilling effect on other people presenting new ideas.
> > >
> > > User apis should be defined first with an end-user in mind. Consistency
> > in
> > > different contexts is also very important (note my expansion of the
> > > discussion to ASCRIBE METADATA.)
> > >
> > > Your statement about extensibility has no technical backing. If you
> have
> > a
> > > concern like this, ask the proposer if they think that this could be
> done
> > > in an extensible way. In this case I see a number of ways that this
> could
> > > be done. Note the mention of the grammar switch in my initial proposal
> or
> > > go review my outstanding patch for integrating JSON literals. In many
> > ways,
> > > I think this approach could be considered the most extensible and
> > > expressive for a Drill extension developer.
> > >
> > > I hope others will continue to brainstorm on this thread.
> >
>
>
>
> --
> Julien
>

Re: select from table with options

Posted by Julien Le Dem <ju...@dremio.com>.

I like the approach of using views to add missing metadata to an existing
raw dataset. The raw datasets stay the same and the view becomes the
interface to the data. Nothing is mutated and we know how things are
derived from one another.

TLDR: I'm trying to summarize the options bellow and add a few thoughts:
(please comment whether you think upside/downside elements are valid)
Let's refer to those options with the name in *bold*

- *urls style params* (my initial strawman): *select * from
dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
Extensibility is ensured through the storage plugin interpretation of the
path.
I agree with decomposing the syntax of format vs the data path/url. so this
would conflict with HTTP query parameters.
I will not pursue this one but I think it was a great conversation starter!

- *table functions*: *select *
from delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')*
It sounds like these would have to be defined in the corresponding
StoragePlugin as they need intimate knowledge of the underlying storage.
Extensibility would come through the storage plugin defining those?
+1 on named parameter vs just positional.
The possible downside is a syntax that could be a little foreign to the
data analyst.

- *fake-o parameters*. *select *
from dfs.`default`.`/path/to/file/something.psv` where magicFieldDelimiter
= '|';*
I would be inclined to avoid using filters for something that changes what
the data looks like.
This could be unintuitive to users. (at least it feels that way to me)
Extensibility would come through the storage plugin defining predicate push
down rules?

- *WITH USING OPTIONS*:
In general I would feel more natural to me to put those options as part of
a select statement. the select statement can always be used in a WITH
clause.
Extensibility would come through the storage plugin receiving those
options? As the with statement applies to a full select statement with
potentially joins, how would we know where to aply those options?
We have two sub-options:
  - *(type = 'text' AND lineDelimiter = '\n')*: this seems similar to fake-o
parameters (above), same comment than above. *AND* seems out of place, for
example OR would be forbidden.
  - *{ type: "text", linedDelimiter = "\n"}*: The advantage of this is that
you can re-use the same syntax in the configuration file. This is a plus
for consistency. Users would figure out what works and would just have to
put it in the central configuration once done.

- *EXTEND WITH OPTIONS*: *SELECT FROM emp EXTEND [optional columns list]
OPTIONS (type 'text', lineDelimiter);*
What would be the column list in that case? Would it be awkward to use
EXTEND without a column list?
Extensibility would come through the storage plugin receiving those extend
options.
It sounds like they could be simply SELECT OPTIONS?

- *Specific syntax*: *select * FROM mydb.mytable*




*    TREAT AS TEXT      USING        LINE DELIMITER '\n'        AND FIELD
DELIMITER ','        AND TREAT FIRST ROW AS HEADERS*
Extensibility would come through the storage plugin defining a sub grammar
for those options.
Possibly this is harder to implement for the storage plugin implementor.
Upside is the user has a SQL like syntax to specify this (although, I've
never been fond of parts where SQL is trying to be plain English)


*Appendix*: :P
For what it's worth, here is how Pig does it: *LOAD 'data' [USING function]
[AS schema];*
- Just *LOAD '/path/to/my/file'* will use the default Loader (tab separated
values)
- adding a custom Loader* LOAD 'data' USING MyCustomLoader('param1',
'param2'); *is how you implement custom formats or remote locations (HBase,
Web service, ...)
So you can use the default loader with a different separator (*USING
PigStorage('\t')*) in parameter or write your own.
- The AS clause lets you set the schema. You could have a csv file without
header and define the names and types of each column that way.
Doc: https://pig.apache.org/docs/r0.15.0/basic.html#load



On Wed, Oct 21, 2015 at 8:57 AM, Julian Hyde <jh...@apache.org> wrote:

> Whatever API is used to scan files from SQL, there will need to be a
> corresponding way to accomplish the same thing in a user interface.
> Probably a form with various fields, some of them with drop-boxes etc.
>
> And ideally a facility that samples a few hundred rows to deduce the
> probable field names and types and which fields are unique.
>
> I think that the UI is the true "user friendly" interface. A usage
> pattern might be for people to define a data source using in the UI,
> save it as a view, then switch to the command line to write queries on
> that view.
>
> There are other use cases similar to reading flies. For example you
> would like to read data from an HTTP URL. You might want to specify
> similar parameters for formats, compression, parsing, and parameters
> in the file URI that describe a family of partitioned files. A URL
> might allow push-down of filters, projects and sorts. But still you
> would want to specify formats, compression and parsing the same way as
> reading files.
>
> To me, this argues for decomposing the file scan syntax into pieces
> that can be re-used if you get data from places other than files.
>
> Julian
>
>
> On Wed, Oct 21, 2015 at 6:15 AM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >> This fourth is also least extensible and thus most disenfranchising for
> >> those outside the inner group.
> >>
> >> Table functions (hopefully) would be something that others could
> > implement.
> >
> > This is a brainstorm about a user apis...
> >
> > It isn't appropriate to shoot down ideas immediately in a brainstorm. It
> > has a chilling effect on other people presenting new ideas.
> >
> > User apis should be defined first with an end-user in mind. Consistency
> in
> > different contexts is also very important (note my expansion of the
> > discussion to ASCRIBE METADATA.)
> >
> > Your statement about extensibility has no technical backing. If you have
> a
> > concern like this, ask the proposer if they think that this could be done
> > in an extensible way. In this case I see a number of ways that this could
> > be done. Note the mention of the grammar switch in my initial proposal or
> > go review my outstanding patch for integrating JSON literals. In many
> ways,
> > I think this approach could be considered the most extensible and
> > expressive for a Drill extension developer.
> >
> > I hope others will continue to brainstorm on this thread.
>



-- 
Julien

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

Whatever API is used to scan files from SQL, there will need to be a
corresponding way to accomplish the same thing in a user interface.
Probably a form with various fields, some of them with drop-boxes etc.

And ideally a facility that samples a few hundred rows to deduce the
probable field names and types and which fields are unique.

I think that the UI is the true "user friendly" interface. A usage
pattern might be for people to define a data source using in the UI,
save it as a view, then switch to the command line to write queries on
that view.

There are other use cases similar to reading flies. For example you
would like to read data from an HTTP URL. You might want to specify
similar parameters for formats, compression, parsing, and parameters
in the file URI that describe a family of partitioned files. A URL
might allow push-down of filters, projects and sorts. But still you
would want to specify formats, compression and parsing the same way as
reading files.

To me, this argues for decomposing the file scan syntax into pieces
that can be re-used if you get data from places other than files.

Julian

On Wed, Oct 21, 2015 at 6:15 AM, Jacques Nadeau <ja...@dremio.com> wrote:
>> This fourth is also least extensible and thus most disenfranchising for
>> those outside the inner group.
>>
>> Table functions (hopefully) would be something that others could
> implement.
>
> This is a brainstorm about a user apis...
>
> It isn't appropriate to shoot down ideas immediately in a brainstorm. It
> has a chilling effect on other people presenting new ideas.
>
> User apis should be defined first with an end-user in mind. Consistency in
> different contexts is also very important (note my expansion of the
> discussion to ASCRIBE METADATA.)
>
> Your statement about extensibility has no technical backing. If you have a
> concern like this, ask the proposer if they think that this could be done
> in an extensible way. In this case I see a number of ways that this could
> be done. Note the mention of the grammar switch in my initial proposal or
> go review my outstanding patch for integrating JSON literals. In many ways,
> I think this approach could be considered the most extensible and
> expressive for a Drill extension developer.
>
> I hope others will continue to brainstorm on this thread.

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

> This fourth is also least extensible and thus most disenfranchising for
> those outside the inner group.
>
> Table functions (hopefully) would be something that others could
implement.

This is a brainstorm about a user apis...

It isn't appropriate to shoot down ideas immediately in a brainstorm. It
has a chilling effect on other people presenting new ideas.

User apis should be defined first with an end-user in mind. Consistency in
different contexts is also very important (note my expansion of the
discussion to ASCRIBE METADATA.)

Your statement about extensibility has no technical backing. If you have a
concern like this, ask the proposer if they think that this could be done
in an extensible way. In this case I see a number of ways that this could
be done. Note the mention of the grammar switch in my initial proposal or
go review my outstanding patch for integrating JSON literals. In many ways,
I think this approach could be considered the most extensible and
expressive for a Drill extension developer.

I hope others will continue to brainstorm on this thread.

Re: select from table with options

Posted by Ted Dunning <te...@gmail.com>.

On Tue, Oct 20, 2015 at 5:35 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> ** (4) Solve with specific syntax for the most common scenarios (very
> declarative) **
>   select * FROM
>   mydb.mytable
>     TREAT AS TEXT
>       USING
>         LINE DELIMITER '\n'
>         AND FIELD DELIMITER ','
>         AND TREAT FIRST ROW AS HEADERS
>
>
> I'm actually most inclined to the fourth. It seems like the most user
> friendly. From a grammar perspective, I think you need to figure out a way
> to use TREAT AS as a grammar switch so we can avoid protecting these
> expressions in all contexts. What is nice about this pattern is that is
> understandable by non-technical users and fits sql. Hiding things in a
> table function makes things more complex.
>

This fourth is also least extensible and thus most disenfranchising for
those outside the inner group.

Table functions (hopefully) would be something that others could implement.

Re: select from table with options

Posted by Jacques Nadeau <ja...@dremio.com>.

Some more options:

** (1) add options to WITH clause (more declarative) **
  WITH MyTable
  AS
    (select * mydb.mytable)
    USING OPTIONS  (type = 'text' AND lineDelimiter = '\n')

** (2) add options to WITH (json literal based) **
  WITH MyTable
  AS
  (select * mydb.mytable)
  USING OPTIONS {  type: "text", linedDelimiter = "\n"}

** (3) Enhance the Calcite EXTEND clause to support an OPTIONS clause **
  SELECT FROM emp EXTEND [optional columns list] OPTIONS (type 'text',
lineDelimiter);

** (4) Solve with specific syntax for the most common scenarios (very
declarative) **
  select * FROM
  mydb.mytable
    TREAT AS TEXT
      USING
        LINE DELIMITER '\n'
        AND FIELD DELIMITER ','
        AND TREAT FIRST ROW AS HEADERS


I'm actually most inclined to the fourth. It seems like the most user
friendly. From a grammar perspective, I think you need to figure out a way
to use TREAT AS as a grammar switch so we can avoid protecting these
expressions in all contexts. What is nice about this pattern is that is
understandable by non-technical users and fits sql. Hiding things in a
table function makes things more complex.

I think we should think about this in the context of ALTER TABLE ASCRIBE
METADATA as well.

For example:
ALTER TABLE mydb.mytable
  ASCRIBE METADATA
    TREAT AS TEXT
      USING
        LINE DELIMITER '\n'
        AND FIELD DELIMITER ','
        AND TREAT FIRST ROW AS HEADERS

If we use table functions, I'm not sure how we make the ascribe metadata
operation have the same syntax.


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Oct 20, 2015 at 11:32 AM, Julian Hyde <jh...@apache.org> wrote:

> +1 to use table functions
>
> In Calcite (and I presume Drill) a “table function” may actually function
> more like a (Lisp) macro. The function gets called at prepare time to yield
> a RelNode (say a TableScan). So a table function is every bit as efficient
> as using a table, but it allows extra parameters.
>
> If the table function has a lot of parameters it might be nice to support
> named parameters:
>
> select * from table(disitributedFile(path => ‘/path/to/something.psv’,
> delimiter => ‘|’));
>
> Named parameters are in the SQL standard but are not supported by
> Calcite’s parser currently. Parameters can be specified in any order, and
> those not specified have a default value.
>
> Julian
>
>
> > On Oct 19, 2015, at 5:18 PM, Ted Dunning <te...@gmail.com> wrote:
> >
> > Wouldn't a table function be a better option?
> >
> > Something like this perhaps?
> >
> > select * from
> > delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')
> >
> > ?
> >
> > Or how about fake-o parameters that the delimited record scanner knows
> how
> > to push down into the scanning of the data? That would look like this:
> >
> > select * from
> > dfs.`default`.`/path/to/file/something.psv`
> > where magicFieldDelimiter = '|';
> >
> >
> >
> > On Mon, Oct 19, 2015 at 2:28 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >
> >> I'm looking into passing information on how to interpret a file through
> the
> >> select clause in Drill.
> >> Something along the lines of:
> >> *select * from
> >> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> >> (In this example, we want to specify a specific delimiter, but that
> would
> >> apply to any *type* of format)
> >>
> >> Which would allow to read a file without having to centrally configure
> >> formats: https://drill.apache.org/docs/querying-plain-text-files/
> >> Which makes it easier to try to read an existing file.
> >> Typically once the user has found the proper settings, they would update
> >> the central configuration.
> >>
> >> thoughts?
> >>
> >> --
> >> Julien
> >>
>
>

Re: select from table with options

Posted by Jim Scott <js...@maprtech.com>.

My initial inclination of a table function was that it sounds kind of
sketchy. But given Julian's elaboration and description this sounds like a
great idea.

>From a user perspective this is easy to understand and flexible. To me I
see this table function model effectively like a hint for how to handle the
data and I think others will see it that way too.

+1

On Tue, Oct 20, 2015 at 1:32 PM, Julian Hyde <jh...@apache.org> wrote:

> +1 to use table functions
>
> In Calcite (and I presume Drill) a “table function” may actually function
> more like a (Lisp) macro. The function gets called at prepare time to yield
> a RelNode (say a TableScan). So a table function is every bit as efficient
> as using a table, but it allows extra parameters.
>
> If the table function has a lot of parameters it might be nice to support
> named parameters:
>
> select * from table(disitributedFile(path => ‘/path/to/something.psv’,
> delimiter => ‘|’));
>
> Named parameters are in the SQL standard but are not supported by
> Calcite’s parser currently. Parameters can be specified in any order, and
> those not specified have a default value.
>
> Julian
>
>
> > On Oct 19, 2015, at 5:18 PM, Ted Dunning <te...@gmail.com> wrote:
> >
> > Wouldn't a table function be a better option?
> >
> > Something like this perhaps?
> >
> > select * from
> > delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')
> >
> > ?
> >
> > Or how about fake-o parameters that the delimited record scanner knows
> how
> > to push down into the scanning of the data? That would look like this:
> >
> > select * from
> > dfs.`default`.`/path/to/file/something.psv`
> > where magicFieldDelimiter = '|';
> >
> >
> >
> > On Mon, Oct 19, 2015 at 2:28 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >
> >> I'm looking into passing information on how to interpret a file through
> the
> >> select clause in Drill.
> >> Something along the lines of:
> >> *select * from
> >> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> >> (In this example, we want to specify a specific delimiter, but that
> would
> >> apply to any *type* of format)
> >>
> >> Which would allow to read a file without having to centrally configure
> >> formats: https://drill.apache.org/docs/querying-plain-text-files/
> >> Which makes it easier to try to read an existing file.
> >> Typically once the user has found the proper settings, they would update
> >> the central configuration.
> >>
> >> thoughts?
> >>
> >> --
> >> Julien
> >>
>
>


-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: select from table with options

Posted by Julian Hyde <jh...@apache.org>.

+1 to use table functions

In Calcite (and I presume Drill) a “table function” may actually function more like a (Lisp) macro. The function gets called at prepare time to yield a RelNode (say a TableScan). So a table function is every bit as efficient as using a table, but it allows extra parameters.

If the table function has a lot of parameters it might be nice to support named parameters:

select * from table(disitributedFile(path => ‘/path/to/something.psv’, delimiter => ‘|’));
 
Named parameters are in the SQL standard but are not supported by Calcite’s parser currently. Parameters can be specified in any order, and those not specified have a default value.

Julian


> On Oct 19, 2015, at 5:18 PM, Ted Dunning <te...@gmail.com> wrote:
> 
> Wouldn't a table function be a better option?
> 
> Something like this perhaps?
> 
> select * from
> delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')
> 
> ?
> 
> Or how about fake-o parameters that the delimited record scanner knows how
> to push down into the scanning of the data? That would look like this:
> 
> select * from
> dfs.`default`.`/path/to/file/something.psv`
> where magicFieldDelimiter = '|';
> 
> 
> 
> On Mon, Oct 19, 2015 at 2:28 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
>> I'm looking into passing information on how to interpret a file through the
>> select clause in Drill.
>> Something along the lines of:
>> *select * from
>> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
>> (In this example, we want to specify a specific delimiter, but that would
>> apply to any *type* of format)
>> 
>> Which would allow to read a file without having to centrally configure
>> formats: https://drill.apache.org/docs/querying-plain-text-files/
>> Which makes it easier to try to read an existing file.
>> Typically once the user has found the proper settings, they would update
>> the central configuration.
>> 
>> thoughts?
>> 
>> --
>> Julien
>>

Re: select from table with options

Posted by Ted Dunning <te...@gmail.com>.

Wouldn't a table function be a better option?

Something like this perhaps?

select * from
delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')

?

Or how about fake-o parameters that the delimited record scanner knows how
to push down into the scanning of the data? That would look like this:

select * from
dfs.`default`.`/path/to/file/something.psv`
where magicFieldDelimiter = '|';



On Mon, Oct 19, 2015 at 2:28 PM, Julien Le Dem <ju...@dremio.com> wrote:

> I'm looking into passing information on how to interpret a file through the
> select clause in Drill.
> Something along the lines of:
> *select * from
> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> (In this example, we want to specify a specific delimiter, but that would
> apply to any *type* of format)
>
> Which would allow to read a file without having to centrally configure
> formats: https://drill.apache.org/docs/querying-plain-text-files/
> Which makes it easier to try to read an existing file.
> Typically once the user has found the proper settings, they would update
> the central configuration.
>
> thoughts?
>
> --
> Julien
>