You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Jing Zhang <be...@gmail.com> on 2022/08/02 07:53:14 UTC

Re: New Polymorphic Tables functionality -- interesting use cases or am I misunderstanding?

Hi, Ray,

> I assume somebody could write custom function to do something like this:
>
> SELECT *
> FROM TABLE (
>    HttpReader(
>       method => 'GET'
>       url => 'https://api.mysite.com/users/1'
>   )
> )
>
> Where "https://api.mysite.com/users/1" returns JSON data like:
>
> { "id": 1, "name": "Some Person", "is_registered": true }
>
> And then we can infer table definition:
>
> TABLE (
>   id INT
>   name TEXT
>   is_registered BOOLEAN
> )
>
> And read JSON as row(s):
>
> id | name | is_registered
> 1  | some | true
>
I think it could be done by a polymorphic table function.
Row type of result table is not declared when the function is created.
It depends on the function arguments, in your case, is 'url' literal
argument.
The row type of result table could be inferred when compile the specified
query.
About how to get the json schema?
Perhaps directly get the json result of literal 'url' argument, analyze the
schema.
The json result could be cached because it would be used again in the later
execution phase.

Best,
Jing Zhang

Gavin Ray <ra...@gmail.com> 于2022年8月1日周一 04:40写道:

> Thank you for the comprehensive response, Jing =)
>
> For example, We often need to consume a CSV file.
> > Generally, the first line of the file contains a list of column names,
> and
> > subsequent lines of the file contain data.
> > However, different CSV files may have different schemas.
> > We could offer a table function named CSVreader which interprets this
> file
>
>
>  FROM TABLE (
> >   CSVreader (File => 'abc.csv')
> > )
>
>
> Ah yes this seems about what I was hoping for!
> I assume somebody could write custom function to do something like this:
>
> SELECT *
> FROM TABLE (
>    HttpReader(
>       method => 'GET'
>       url => 'https://api.mysite.com/users/1'
>   )
> )
>
> Where "https://api.mysite.com/users/1" returns JSON data like:
>
> { "id": 1, "name": "Some Person", "is_registered": true }
>
> And then we can infer table definition:
>
> TABLE (
>   id INT
>   name TEXT
>   is_registered BOOLEAN
> )
>
> And read JSON as row(s):
>
> id | name | is_registered
> 1  | some | true
>
>
> On Sun, Jul 31, 2022 at 10:37 AM Jing Zhang <be...@gmail.com> wrote:
>
> > Hi,
> > Thanks for your attention.
> > Yes, we hope to provide full support for polymorphic table functions.
> > We have made some progress in version 1.31.0 which will be released soon.
> > Any feedback is welcome.
> >
> > I'm not sure whether the polymorphic table function could satisfy your
> > requirement because I don't know your demands very well yet.
> > It would be very helpful if you could provide some more detailed
> > description.
> >
> > Here, I would like to share something about polymorphic table function.
> > A polymorphic table function is a function that returns a table whose row
> > type is not declared when the function is created.
> > Rather, the row type of the result may depend on the function arguments
> in
> > the invocation of a PTF,
> > and therefore may vary depending on the precise syntax containing the PTF
> > invocation.
> > In addition, a PTF may have generic table parameters (i.e., no row type
> > declared when the PTF is created),
> > and the row type of the result might depend on the row type(s) of the
> input
> > tables.
> >
> > For example, We often need to consume a CSV file.
> > Generally, the first line of the file contains a list of column names,
> and
> > subsequent lines of the file contain data.
> > However, different CSV files may have different schemas.
> > We could offer a table function named CSVreader which interprets this
> file
> > as a table.
> > For example, there is a csv file named abc.csv with the following
> contents:
> >
> > docno,name,due_date,principle,interest
> > 123,Mary,01/01/2014,234.56,345.67
> > 234,Edgar,01/01/2014,654.32,543.21
> >
> >
> > the query author may write a query such as the following:
> >
> >
> > SELECT *
> > FROM TABLE (
> >   CSVreader (
> >     File => 'abc.csv',
> >     Floats => DESCRIPTOR ("principle", "interest")
> >     Dates => DESCRIPTOR ("due_date")
> >   )
> > ) AS S
> >
> >
> > The result will be
> > docnonamedue_dateprincipleinterest
> > 123 Mary 01/01/2014 234.56 345.67
> > 234 Edgar 01/01/2014 654.32 543.21
> >
> >
> > More examples could be found in the description of CALCITE-4865[1].
> > Users could define a custom polymorphic table function by inheriting a
> > related interface[2].
> >
> > Best,
> > Jing Zhang
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-4865
> > [2]
> >
> >
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlTableFunction.java
> >
> > Gavin Ray <ra...@gmail.com> 于2022年7月30日周六 07:27写道:
> >
> > > I see there's a recent commit which references this:
> > > [CALCITE-4864] Supports Polymorphic Table function - ASF JIRA (
> > apache.org)
> > > <https://issues.apache.org/jira/browse/CALCITE-4864>
> > >
> > > Something Teiid [0] did was to use JSON_TABLE() from dynamic data at
> > > runtime.
> > >
> > > I had an opportunity to speak to the maintainers, and they said this
> was
> > a
> > > key strategy in writing adapters
> > > for non-relational sources like Couchbase, MongoDB, and the Salesforce
> > API,
> > > etc.
> > >
> > > If I am understanding correctly, does this new polymorphic table
> > > functionality unlock the
> > > ability to have Calcite consume runtime data and use it with
> table-valued
> > > functions?
> > >
> > > If so, that's huge! Would love to whip up a demo of an HTTP request UDF
> > > that does a JSON_TABLE on the output.
> > >
> > > If not, I'd still love to learn what this functionality is all about
> and
> > > what you might do with it =)
> > >
> > > ----
> > >
> > > [0]: teiid/teiid: Teiid is a data virtualization system that allows
> > > applications to use data from multiple, heterogenous data stores.
> > > (github.com) <https://github.com/teiid/teiid>
> > >
> >
>