You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Gavin Ray <ra...@gmail.com> on 2022/07/29 23:27:34 UTC

New Polymorphic Tables functionality -- interesting use cases or am I misunderstanding?

I see there's a recent commit which references this:
[CALCITE-4864] Supports Polymorphic Table function - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/CALCITE-4864>

Something Teiid [0] did was to use JSON_TABLE() from dynamic data at
runtime.

I had an opportunity to speak to the maintainers, and they said this was a
key strategy in writing adapters
for non-relational sources like Couchbase, MongoDB, and the Salesforce API,
etc.

If I am understanding correctly, does this new polymorphic table
functionality unlock the
ability to have Calcite consume runtime data and use it with table-valued
functions?

If so, that's huge! Would love to whip up a demo of an HTTP request UDF
that does a JSON_TABLE on the output.

If not, I'd still love to learn what this functionality is all about and
what you might do with it =)

----

[0]: teiid/teiid: Teiid is a data virtualization system that allows
applications to use data from multiple, heterogenous data stores.
(github.com) <https://github.com/teiid/teiid>

Re: New Polymorphic Tables functionality -- interesting use cases or am I misunderstanding?

Posted by Jing Zhang <be...@gmail.com>.
Hi, Ray,

> I assume somebody could write custom function to do something like this:
>
> SELECT *
> FROM TABLE (
>    HttpReader(
>       method => 'GET'
>       url => 'https://api.mysite.com/users/1'
>   )
> )
>
> Where "https://api.mysite.com/users/1" returns JSON data like:
>
> { "id": 1, "name": "Some Person", "is_registered": true }
>
> And then we can infer table definition:
>
> TABLE (
>   id INT
>   name TEXT
>   is_registered BOOLEAN
> )
>
> And read JSON as row(s):
>
> id | name | is_registered
> 1  | some | true
>
I think it could be done by a polymorphic table function.
Row type of result table is not declared when the function is created.
It depends on the function arguments, in your case, is 'url' literal
argument.
The row type of result table could be inferred when compile the specified
query.
About how to get the json schema?
Perhaps directly get the json result of literal 'url' argument, analyze the
schema.
The json result could be cached because it would be used again in the later
execution phase.

Best,
Jing Zhang

Gavin Ray <ra...@gmail.com> 于2022年8月1日周一 04:40写道:

> Thank you for the comprehensive response, Jing =)
>
> For example, We often need to consume a CSV file.
> > Generally, the first line of the file contains a list of column names,
> and
> > subsequent lines of the file contain data.
> > However, different CSV files may have different schemas.
> > We could offer a table function named CSVreader which interprets this
> file
>
>
>  FROM TABLE (
> >   CSVreader (File => 'abc.csv')
> > )
>
>
> Ah yes this seems about what I was hoping for!
> I assume somebody could write custom function to do something like this:
>
> SELECT *
> FROM TABLE (
>    HttpReader(
>       method => 'GET'
>       url => 'https://api.mysite.com/users/1'
>   )
> )
>
> Where "https://api.mysite.com/users/1" returns JSON data like:
>
> { "id": 1, "name": "Some Person", "is_registered": true }
>
> And then we can infer table definition:
>
> TABLE (
>   id INT
>   name TEXT
>   is_registered BOOLEAN
> )
>
> And read JSON as row(s):
>
> id | name | is_registered
> 1  | some | true
>
>
> On Sun, Jul 31, 2022 at 10:37 AM Jing Zhang <be...@gmail.com> wrote:
>
> > Hi,
> > Thanks for your attention.
> > Yes, we hope to provide full support for polymorphic table functions.
> > We have made some progress in version 1.31.0 which will be released soon.
> > Any feedback is welcome.
> >
> > I'm not sure whether the polymorphic table function could satisfy your
> > requirement because I don't know your demands very well yet.
> > It would be very helpful if you could provide some more detailed
> > description.
> >
> > Here, I would like to share something about polymorphic table function.
> > A polymorphic table function is a function that returns a table whose row
> > type is not declared when the function is created.
> > Rather, the row type of the result may depend on the function arguments
> in
> > the invocation of a PTF,
> > and therefore may vary depending on the precise syntax containing the PTF
> > invocation.
> > In addition, a PTF may have generic table parameters (i.e., no row type
> > declared when the PTF is created),
> > and the row type of the result might depend on the row type(s) of the
> input
> > tables.
> >
> > For example, We often need to consume a CSV file.
> > Generally, the first line of the file contains a list of column names,
> and
> > subsequent lines of the file contain data.
> > However, different CSV files may have different schemas.
> > We could offer a table function named CSVreader which interprets this
> file
> > as a table.
> > For example, there is a csv file named abc.csv with the following
> contents:
> >
> > docno,name,due_date,principle,interest
> > 123,Mary,01/01/2014,234.56,345.67
> > 234,Edgar,01/01/2014,654.32,543.21
> >
> >
> > the query author may write a query such as the following:
> >
> >
> > SELECT *
> > FROM TABLE (
> >   CSVreader (
> >     File => 'abc.csv',
> >     Floats => DESCRIPTOR ("principle", "interest")
> >     Dates => DESCRIPTOR ("due_date")
> >   )
> > ) AS S
> >
> >
> > The result will be
> > docnonamedue_dateprincipleinterest
> > 123 Mary 01/01/2014 234.56 345.67
> > 234 Edgar 01/01/2014 654.32 543.21
> >
> >
> > More examples could be found in the description of CALCITE-4865[1].
> > Users could define a custom polymorphic table function by inheriting a
> > related interface[2].
> >
> > Best,
> > Jing Zhang
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-4865
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlTableFunction.java
> >
> > Gavin Ray <ra...@gmail.com> 于2022年7月30日周六 07:27写道:
> >
> > > I see there's a recent commit which references this:
> > > [CALCITE-4864] Supports Polymorphic Table function - ASF JIRA (
> > apache.org)
> > > <https://issues.apache.org/jira/browse/CALCITE-4864>
> > >
> > > Something Teiid [0] did was to use JSON_TABLE() from dynamic data at
> > > runtime.
> > >
> > > I had an opportunity to speak to the maintainers, and they said this
> was
> > a
> > > key strategy in writing adapters
> > > for non-relational sources like Couchbase, MongoDB, and the Salesforce
> > API,
> > > etc.
> > >
> > > If I am understanding correctly, does this new polymorphic table
> > > functionality unlock the
> > > ability to have Calcite consume runtime data and use it with
> table-valued
> > > functions?
> > >
> > > If so, that's huge! Would love to whip up a demo of an HTTP request UDF
> > > that does a JSON_TABLE on the output.
> > >
> > > If not, I'd still love to learn what this functionality is all about
> and
> > > what you might do with it =)
> > >
> > > ----
> > >
> > > [0]: teiid/teiid: Teiid is a data virtualization system that allows
> > > applications to use data from multiple, heterogenous data stores.
> > > (github.com) <https://github.com/teiid/teiid>
> > >
> >
>

Re: New Polymorphic Tables functionality -- interesting use cases or am I misunderstanding?

Posted by Gavin Ray <ra...@gmail.com>.
Thank you for the comprehensive response, Jing =)

For example, We often need to consume a CSV file.
> Generally, the first line of the file contains a list of column names, and
> subsequent lines of the file contain data.
> However, different CSV files may have different schemas.
> We could offer a table function named CSVreader which interprets this file


 FROM TABLE (
>   CSVreader (File => 'abc.csv')
> )


Ah yes this seems about what I was hoping for!
I assume somebody could write custom function to do something like this:

SELECT *
FROM TABLE (
   HttpReader(
      method => 'GET'
      url => 'https://api.mysite.com/users/1'
  )
)

Where "https://api.mysite.com/users/1" returns JSON data like:

{ "id": 1, "name": "Some Person", "is_registered": true }

And then we can infer table definition:

TABLE (
  id INT
  name TEXT
  is_registered BOOLEAN
)

And read JSON as row(s):

id | name | is_registered
1  | some | true


On Sun, Jul 31, 2022 at 10:37 AM Jing Zhang <be...@gmail.com> wrote:

> Hi,
> Thanks for your attention.
> Yes, we hope to provide full support for polymorphic table functions.
> We have made some progress in version 1.31.0 which will be released soon.
> Any feedback is welcome.
>
> I'm not sure whether the polymorphic table function could satisfy your
> requirement because I don't know your demands very well yet.
> It would be very helpful if you could provide some more detailed
> description.
>
> Here, I would like to share something about polymorphic table function.
> A polymorphic table function is a function that returns a table whose row
> type is not declared when the function is created.
> Rather, the row type of the result may depend on the function arguments in
> the invocation of a PTF,
> and therefore may vary depending on the precise syntax containing the PTF
> invocation.
> In addition, a PTF may have generic table parameters (i.e., no row type
> declared when the PTF is created),
> and the row type of the result might depend on the row type(s) of the input
> tables.
>
> For example, We often need to consume a CSV file.
> Generally, the first line of the file contains a list of column names, and
> subsequent lines of the file contain data.
> However, different CSV files may have different schemas.
> We could offer a table function named CSVreader which interprets this file
> as a table.
> For example, there is a csv file named abc.csv with the following contents:
>
> docno,name,due_date,principle,interest
> 123,Mary,01/01/2014,234.56,345.67
> 234,Edgar,01/01/2014,654.32,543.21
>
>
> the query author may write a query such as the following:
>
>
> SELECT *
> FROM TABLE (
>   CSVreader (
>     File => 'abc.csv',
>     Floats => DESCRIPTOR ("principle", "interest")
>     Dates => DESCRIPTOR ("due_date")
>   )
> ) AS S
>
>
> The result will be
> docnonamedue_dateprincipleinterest
> 123 Mary 01/01/2014 234.56 345.67
> 234 Edgar 01/01/2014 654.32 543.21
>
>
> More examples could be found in the description of CALCITE-4865[1].
> Users could define a custom polymorphic table function by inheriting a
> related interface[2].
>
> Best,
> Jing Zhang
>
> [1] https://issues.apache.org/jira/browse/CALCITE-4865
> [2]
>
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlTableFunction.java
>
> Gavin Ray <ra...@gmail.com> 于2022年7月30日周六 07:27写道:
>
> > I see there's a recent commit which references this:
> > [CALCITE-4864] Supports Polymorphic Table function - ASF JIRA (
> apache.org)
> > <https://issues.apache.org/jira/browse/CALCITE-4864>
> >
> > Something Teiid [0] did was to use JSON_TABLE() from dynamic data at
> > runtime.
> >
> > I had an opportunity to speak to the maintainers, and they said this was
> a
> > key strategy in writing adapters
> > for non-relational sources like Couchbase, MongoDB, and the Salesforce
> API,
> > etc.
> >
> > If I am understanding correctly, does this new polymorphic table
> > functionality unlock the
> > ability to have Calcite consume runtime data and use it with table-valued
> > functions?
> >
> > If so, that's huge! Would love to whip up a demo of an HTTP request UDF
> > that does a JSON_TABLE on the output.
> >
> > If not, I'd still love to learn what this functionality is all about and
> > what you might do with it =)
> >
> > ----
> >
> > [0]: teiid/teiid: Teiid is a data virtualization system that allows
> > applications to use data from multiple, heterogenous data stores.
> > (github.com) <https://github.com/teiid/teiid>
> >
>

Re: New Polymorphic Tables functionality -- interesting use cases or am I misunderstanding?

Posted by Jing Zhang <be...@gmail.com>.
Hi,
Thanks for your attention.
Yes, we hope to provide full support for polymorphic table functions.
We have made some progress in version 1.31.0 which will be released soon.
Any feedback is welcome.

I'm not sure whether the polymorphic table function could satisfy your
requirement because I don't know your demands very well yet.
It would be very helpful if you could provide some more detailed
description.

Here, I would like to share something about polymorphic table function.
A polymorphic table function is a function that returns a table whose row
type is not declared when the function is created.
Rather, the row type of the result may depend on the function arguments in
the invocation of a PTF,
and therefore may vary depending on the precise syntax containing the PTF
invocation.
In addition, a PTF may have generic table parameters (i.e., no row type
declared when the PTF is created),
and the row type of the result might depend on the row type(s) of the input
tables.

For example, We often need to consume a CSV file.
Generally, the first line of the file contains a list of column names, and
subsequent lines of the file contain data.
However, different CSV files may have different schemas.
We could offer a table function named CSVreader which interprets this file
as a table.
For example, there is a csv file named abc.csv with the following contents:

docno,name,due_date,principle,interest
123,Mary,01/01/2014,234.56,345.67
234,Edgar,01/01/2014,654.32,543.21


the query author may write a query such as the following:


SELECT *
FROM TABLE (
  CSVreader (
    File => 'abc.csv',
    Floats => DESCRIPTOR ("principle", "interest")
    Dates => DESCRIPTOR ("due_date")
  )
) AS S


The result will be
docnonamedue_dateprincipleinterest
123 Mary 01/01/2014 234.56 345.67
234 Edgar 01/01/2014 654.32 543.21


More examples could be found in the description of CALCITE-4865[1].
Users could define a custom polymorphic table function by inheriting a
related interface[2].

Best,
Jing Zhang

[1] https://issues.apache.org/jira/browse/CALCITE-4865
[2]
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlTableFunction.java

Gavin Ray <ra...@gmail.com> 于2022年7月30日周六 07:27写道:

> I see there's a recent commit which references this:
> [CALCITE-4864] Supports Polymorphic Table function - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/CALCITE-4864>
>
> Something Teiid [0] did was to use JSON_TABLE() from dynamic data at
> runtime.
>
> I had an opportunity to speak to the maintainers, and they said this was a
> key strategy in writing adapters
> for non-relational sources like Couchbase, MongoDB, and the Salesforce API,
> etc.
>
> If I am understanding correctly, does this new polymorphic table
> functionality unlock the
> ability to have Calcite consume runtime data and use it with table-valued
> functions?
>
> If so, that's huge! Would love to whip up a demo of an HTTP request UDF
> that does a JSON_TABLE on the output.
>
> If not, I'd still love to learn what this functionality is all about and
> what you might do with it =)
>
> ----
>
> [0]: teiid/teiid: Teiid is a data virtualization system that allows
> applications to use data from multiple, heterogenous data stores.
> (github.com) <https://github.com/teiid/teiid>
>