You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Austin Richardson <au...@teampicnic.com> on 2023/12/11 13:24:00 UTC

Calcite tables which support multiple data retrieval patterns

Hello,


My team is working on a feature to add a general “base table” for a
collection of Calcite tables in our service. Something we’ve come across is
that different tables will have different supported data retrieval
patterns, which we’ve been referring to as “indices”.

For example, assume we have a service with two APIs:


   -

   getByPrice(<inputs>) → <results>
   -

   getByPriceAndTime(<inputs>) → <results>


In this case, the Calcite table which will expose this service has two
possible ways to retrieve data (i.e. there are two “indices”). Since the
table implementation is meant to be generic, we would like to devise a
general solution for encoding these indices in Calcite (with relative
priority in case one index is preferred over the other), and having Calcite
choose the “best” one for a given query.

Have you encountered this problem or something similar before? We have some
ideas for how to solve it, but wanted to see if this is a common pattern.


Thank you,

Austin

Re: Calcite tables which support multiple data retrieval patterns

Posted by Julian Hyde <jh...@gmail.com>.

PS Austin, You and the people on the cc: list should subscribe to dev@calcite. It will allow you to post without moderation (which often adds a delay of several hours) and will allow you to receive replies.

> On Dec 11, 2023, at 12:05 PM, Julian Hyde <jh...@gmail.com> wrote:
> 
> Thanks for starting this conversation.
> 
> I believe we should be looking for common patterns and devising ways to solve them using relational algebra. (Because that’s the hammer that we have in Calcite… and it’s a very powerful hammer. For example, we can represent all kinds of materialized views, including indexes, sorted tables, and summary tables, by just saying ’the contents of table T are always equivalent to query Q’.)
> 
> Your service calls remind me of the REST adapter (extensions to the file adapter to send parameters to HTTP servers) [1]. Whereas the REST adapter would talk to non-SQL back-ends by imagining that they are parameterized queries, your proposal would create a REST-like interface as an alternative to SQL.
> 
> Your API calls also remind me of ‘parameterized views’. If the API has zero parameters, it looks like a SQL view. If it has one or more parameters, the best you can do in current SQL is to create a table function. The problem with table functions - if they are implemented in a procedural language such as Java - is that the optimizer can’t see into them, push filters, or do other optimizations. So Calcite has a concept called table macros; inspired by Lisp macros, which look like functions but return an AST, and can therefore be used for metaprogramming. 
> 
> I think your API calls would map naturally to table macros, but maybe we need a new catalog element (and/or DDL) to make them usable to people who don’t want to use a Java API.
> 
> Julian
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-4035 
> 
> [2] https://calcite.apache.org/docs/adapter.html#table-functions-and-table-macros 
> 
>> On Dec 11, 2023, at 5:24 AM, Austin Richardson <au...@teampicnic.com> wrote:
>> 
>> Hello,
>> 
>> 
>> My team is working on a feature to add a general “base table” for a
>> collection of Calcite tables in our service. Something we’ve come across is
>> that different tables will have different supported data retrieval
>> patterns, which we’ve been referring to as “indices”.
>> 
>> For example, assume we have a service with two APIs:
>> 
>> 
>>   -
>> 
>>   getByPrice(<inputs>) → <results>
>>   -
>> 
>>   getByPriceAndTime(<inputs>) → <results>
>> 
>> 
>> In this case, the Calcite table which will expose this service has two
>> possible ways to retrieve data (i.e. there are two “indices”). Since the
>> table implementation is meant to be generic, we would like to devise a
>> general solution for encoding these indices in Calcite (with relative
>> priority in case one index is preferred over the other), and having Calcite
>> choose the “best” one for a given query.
>> 
>> Have you encountered this problem or something similar before? We have some
>> ideas for how to solve it, but wanted to see if this is a common pattern.
>> 
>> 
>> Thank you,
>> 
>> Austin
>

Re: Calcite tables which support multiple data retrieval patterns

Posted by Julian Hyde <jh...@gmail.com>.

Thanks for starting this conversation.

I believe we should be looking for common patterns and devising ways to solve them using relational algebra. (Because that’s the hammer that we have in Calcite… and it’s a very powerful hammer. For example, we can represent all kinds of materialized views, including indexes, sorted tables, and summary tables, by just saying ’the contents of table T are always equivalent to query Q’.)

Your service calls remind me of the REST adapter (extensions to the file adapter to send parameters to HTTP servers) [1]. Whereas the REST adapter would talk to non-SQL back-ends by imagining that they are parameterized queries, your proposal would create a REST-like interface as an alternative to SQL.

Your API calls also remind me of ‘parameterized views’. If the API has zero parameters, it looks like a SQL view. If it has one or more parameters, the best you can do in current SQL is to create a table function. The problem with table functions - if they are implemented in a procedural language such as Java - is that the optimizer can’t see into them, push filters, or do other optimizations. So Calcite has a concept called table macros; inspired by Lisp macros, which look like functions but return an AST, and can therefore be used for metaprogramming. 

I think your API calls would map naturally to table macros, but maybe we need a new catalog element (and/or DDL) to make them usable to people who don’t want to use a Java API.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-4035 

[2] https://calcite.apache.org/docs/adapter.html#table-functions-and-table-macros 

> On Dec 11, 2023, at 5:24 AM, Austin Richardson <au...@teampicnic.com> wrote:
> 
> Hello,
> 
> 
> My team is working on a feature to add a general “base table” for a
> collection of Calcite tables in our service. Something we’ve come across is
> that different tables will have different supported data retrieval
> patterns, which we’ve been referring to as “indices”.
> 
> For example, assume we have a service with two APIs:
> 
> 
>   -
> 
>   getByPrice(<inputs>) → <results>
>   -
> 
>   getByPriceAndTime(<inputs>) → <results>
> 
> 
> In this case, the Calcite table which will expose this service has two
> possible ways to retrieve data (i.e. there are two “indices”). Since the
> table implementation is meant to be generic, we would like to devise a
> general solution for encoding these indices in Calcite (with relative
> priority in case one index is preferred over the other), and having Calcite
> choose the “best” one for a given query.
> 
> Have you encountered this problem or something similar before? We have some
> ideas for how to solve it, but wanted to see if this is a common pattern.
> 
> 
> Thank you,
> 
> Austin