You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Gleb Kanterov <gl...@spotify.com> on 2018/11/28 18:36:20 UTC

[ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

At the moment we support only ScalarFunction UDF, it's functions that
operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
functions (that we already support), table macro and table functions. The
difference between table functions and macros is that macros expand to
relations, and table functions can refer to anything queryable, e.g.,
enumerables. But in the case of Beam SQL, given everything translates to
PTransforms, only table macros are relevant.

UDTF are in a way similar to external tables but don't require to specify a
schema explicitly. Instead, they can derive schema based on arguments. One
of the use-cases would be querying ranges of dataset partitions using a
helper function like:

SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start => '2017-01-01',
end => '2018-01-01'))

With BEAM-6133 <https://issues.apache.org/jira/browse/BEAM-6133> (
apache/beam/#7141 <https://github.com/apache/beam/pull/7141>) we would have
support for UDTF in Beam SQL.

[1] https://issues.apache.org/jira/browse/BEAM-6133
[2] https://github.com/apache/beam/pull/7141

Gleb

Re: [ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

Posted by Kenneth Knowles <kl...@google.com>.
OK great. I happen to have recently read that bit of SQL 2016 so that puts
it in context for me very nicely.

Kenn

On Fri, Dec 14, 2018 at 3:58 AM Gleb Kanterov <gl...@spotify.com> wrote:

> Kenn, I don't have a copy of a recent SQL standard to confirm what I'm
> saying. To my knowledge, initially, there was a concept of a table
> function. Table functions should have a static type that doesn't depend on
> supplied arguments. In ANSI SQL 2016, there is a concept of polymorphic
> table functions, that can infer types depending on provided arguments. Both
> TableFunction and TableMacro in Calcite are polymorphic table functions,
> and the difference between TableFunction and TableMacro is internal to
> Calcite.
>
> Gleb
>
>
>
> On Fri, Dec 14, 2018 at 4:26 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Sorry for the slow reply & review. Having UDTF support in Beam SQL is
>> extremely useful. Are both table functions and table macros part of
>> "standard" SQL or is this a distinction between different Calcite concepts?
>>
>> Kenn
>>
>> On Wed, Nov 28, 2018 at 10:36 AM Gleb Kanterov <gl...@spotify.com> wrote:
>>
>>> At the moment we support only ScalarFunction UDF, it's functions that
>>> operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
>>> functions (that we already support), table macro and table functions. The
>>> difference between table functions and macros is that macros expand to
>>> relations, and table functions can refer to anything queryable, e.g.,
>>> enumerables. But in the case of Beam SQL, given everything translates to
>>> PTransforms, only table macros are relevant.
>>>
>>> UDTF are in a way similar to external tables but don't require to
>>> specify a schema explicitly. Instead, they can derive schema based on
>>> arguments. One of the use-cases would be querying ranges of dataset
>>> partitions using a helper function like:
>>>
>>> SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start =>
>>> '2017-01-01', end => '2018-01-01'))
>>>
>>> With BEAM-6133 <https://issues.apache.org/jira/browse/BEAM-6133> (
>>> apache/beam/#7141 <https://github.com/apache/beam/pull/7141>) we would
>>> have support for UDTF in Beam SQL.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6133
>>> [2] https://github.com/apache/beam/pull/7141
>>>
>>> Gleb
>>>
>>
>
> --
> Cheers,
> Gleb
>

Re: [ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

Posted by Gleb Kanterov <gl...@spotify.com>.
Kenn, I don't have a copy of a recent SQL standard to confirm what I'm
saying. To my knowledge, initially, there was a concept of a table
function. Table functions should have a static type that doesn't depend on
supplied arguments. In ANSI SQL 2016, there is a concept of polymorphic
table functions, that can infer types depending on provided arguments. Both
TableFunction and TableMacro in Calcite are polymorphic table functions,
and the difference between TableFunction and TableMacro is internal to
Calcite.

Gleb



On Fri, Dec 14, 2018 at 4:26 AM Kenneth Knowles <ke...@apache.org> wrote:

> Sorry for the slow reply & review. Having UDTF support in Beam SQL is
> extremely useful. Are both table functions and table macros part of
> "standard" SQL or is this a distinction between different Calcite concepts?
>
> Kenn
>
> On Wed, Nov 28, 2018 at 10:36 AM Gleb Kanterov <gl...@spotify.com> wrote:
>
>> At the moment we support only ScalarFunction UDF, it's functions that
>> operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
>> functions (that we already support), table macro and table functions. The
>> difference between table functions and macros is that macros expand to
>> relations, and table functions can refer to anything queryable, e.g.,
>> enumerables. But in the case of Beam SQL, given everything translates to
>> PTransforms, only table macros are relevant.
>>
>> UDTF are in a way similar to external tables but don't require to specify
>> a schema explicitly. Instead, they can derive schema based on arguments.
>> One of the use-cases would be querying ranges of dataset partitions using a
>> helper function like:
>>
>> SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start =>
>> '2017-01-01', end => '2018-01-01'))
>>
>> With BEAM-6133 <https://issues.apache.org/jira/browse/BEAM-6133> (
>> apache/beam/#7141 <https://github.com/apache/beam/pull/7141>) we would
>> have support for UDTF in Beam SQL.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6133
>> [2] https://github.com/apache/beam/pull/7141
>>
>> Gleb
>>
>

-- 
Cheers,
Gleb

Re: [ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

Posted by Kenneth Knowles <ke...@apache.org>.
Sorry for the slow reply & review. Having UDTF support in Beam SQL is
extremely useful. Are both table functions and table macros part of
"standard" SQL or is this a distinction between different Calcite concepts?

Kenn

On Wed, Nov 28, 2018 at 10:36 AM Gleb Kanterov <gl...@spotify.com> wrote:

> At the moment we support only ScalarFunction UDF, it's functions that
> operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
> functions (that we already support), table macro and table functions. The
> difference between table functions and macros is that macros expand to
> relations, and table functions can refer to anything queryable, e.g.,
> enumerables. But in the case of Beam SQL, given everything translates to
> PTransforms, only table macros are relevant.
>
> UDTF are in a way similar to external tables but don't require to specify
> a schema explicitly. Instead, they can derive schema based on arguments.
> One of the use-cases would be querying ranges of dataset partitions using a
> helper function like:
>
> SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start =>
> '2017-01-01', end => '2018-01-01'))
>
> With BEAM-6133 <https://issues.apache.org/jira/browse/BEAM-6133> (
> apache/beam/#7141 <https://github.com/apache/beam/pull/7141>) we would
> have support for UDTF in Beam SQL.
>
> [1] https://issues.apache.org/jira/browse/BEAM-6133
> [2] https://github.com/apache/beam/pull/7141
>
> Gleb
>