You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Stamatis Zampetakis <za...@gmail.com> on 2018/06/21 09:06:42 UTC

Creating filter expressions with java predicates

Hi all,

I am trying to replace pieces of an old query execution framework with
Calcite. Consider for example the following very simplified representation
of such Query in this framework.

class Query
{
private final String pathToTableData;
private final Predicate<Record> pred1;
private final Predicate<Record> pred2;
}

Basically, it corresponds to a TableScan with Filter(s) so I would like to
map it as such to Calcite.

*Approach A*
The first thing that came to my mind is to use the RelBuilder and
RexBuilder classes to do so. Then when I am about to create a filter, I am
not sure how to create the respective row expression that is able to
describe the above predicates. At the moment, I have the following ideas:

   1. One possibility would be to create RexCall expression with a custom
   SqlOperator that takes the java predicate among its parameters.
   2. Another possibility would be to create RexCall expression with a
   SqlUserDefinedFunction that already accepts as an input a Function object
   and there I could pass a custom Function object that contains the java
   Predicate.

Then during query planning, it is necessary to add an appropriate rule that
goes over the condition of the filter examine the SqlOperator and introduce
a CustomFilter expression for handling this call.

*Approach B*
The second alternative would be to not to use at all the RelBuilder and
build the plan manually with custom relational expressions (subclasses of
RelNode). As such the CustomFilter expression could take as an argument a
java predicate and there is no need to introduce further changes or rules.
The latter implies that the CustomFilter does not have a RexNode condition,
which further means that it cannot inherit from the existing Filter class.

I was wondering if any of the above approaches seems reasonable enough and
if there are other better alternatives that I am missing. Suggestions and
comments are very welcomed.

Re: Creating filter expressions with java predicates

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Julian,

Thanks a lot for the suggestions.

Unfortunately, the API (public and quite old) for defining the predicates
is quite permissive and it does not impose anything regarding a public
no-arg constructor or stateless behavior.

I don't think a TableFunctionScan is a good fit for the use-case I
described. Maybe it was not clear from my previous example but the
predicate takes as an input a single row of the input relational expression
and either lets it pass or not.
I see how the TableFunctionScan could work but the semantics of this
expression are quite different than those of a Filter (most importantly the
fact that it cannot introduce new rows or change the type of the input
expression).

Continuing on the idea of using a user-defined function, I would say that
one argument is the type of the table (record type?) and the output is a
boolean, and this brings me to another question.

*How can we describe with a row expression (RexNode) that the input to a
function is the complete row from the input relational expression?*

For the sake of the example, let's assume that we have a Scan over the
following table: EMP(EMPNO, ENAME, DEPTNO). Furthermore, we want to apply a
Filter with a function similar to the one below:
boolean isValidEmployee(Object[] emp)

If I had to create such an expression manually, I think it would be
something with the following structure:

RexCall(RexCall(RexInputRef,RexInputRef,RexInputRef)) =>
ISVALIDEMPLOYEE(ROW($0,
$1, $2))

or

RexCall(RexCall(RexLiteral)) => ISVALIDEMPLOYEE(RINPUT(0))

Is there a Calcite convention on how to treat this situation (possibly test
cases which exhibit such use-cases)?

Best,
Stamatis

2018-06-22 2:54 GMT+02:00 Julian Hyde <jh...@apache.org>:

> Regardless of how you create it, it’s difficult to pass arbitrary objects
> into a plan. It you can ensure that each predicate has a public
> no-arguments constructor, you could pass the predicate’s class name. Then
> your custom operator can instantiate the predicate.


> One option is to create a user-defined table function. One of its
> arguments will be a cursor (the input relational expression) and other
> arguments will be the names of the predicate classes. Its output is a
> cursor.
>
> There is currently no method in RelBuilder to add a table function scan
> (see https://issues.apache.org/jira/browse/CALCITE-1515 <
> https://issues.apache.org/jira/browse/CALCITE-1515>) but you can create
> one manually:
>
>   RelBuilder relBuilder;
>   …
>   List<RelNode> inputs = ImmutableList.of(relBuilder.build());
>   relBuilder.push(new TableFunctionScan(…, inputs, …));
>
> Because of RelBuilder’s stack model, you can easily mix RelNodes that it
> creates with RelNodes you create manually.
>
> Julian
>
>
> > On Jun 21, 2018, at 2:06 AM, Stamatis Zampetakis <za...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I am trying to replace pieces of an old query execution framework with
> > Calcite. Consider for example the following very simplified
> representation
> > of such Query in this framework.
> >
> > class Query
> > {
> > private final String pathToTableData;
> > private final Predicate<Record> pred1;
> > private final Predicate<Record> pred2;
> > }
> >
> > Basically, it corresponds to a TableScan with Filter(s) so I would like
> to
> > map it as such to Calcite.
> >
> > *Approach A*
> > The first thing that came to my mind is to use the RelBuilder and
> > RexBuilder classes to do so. Then when I am about to create a filter, I
> am
> > not sure how to create the respective row expression that is able to
> > describe the above predicates. At the moment, I have the following ideas:
> >
> >   1. One possibility would be to create RexCall expression with a custom
> >   SqlOperator that takes the java predicate among its parameters.
> >   2. Another possibility would be to create RexCall expression with a
> >   SqlUserDefinedFunction that already accepts as an input a Function
> object
> >   and there I could pass a custom Function object that contains the java
> >   Predicate.
> >
> > Then during query planning, it is necessary to add an appropriate rule
> that
> > goes over the condition of the filter examine the SqlOperator and
> introduce
> > a CustomFilter expression for handling this call.
> >
> > *Approach B*
> > The second alternative would be to not to use at all the RelBuilder and
> > build the plan manually with custom relational expressions (subclasses of
> > RelNode). As such the CustomFilter expression could take as an argument a
> > java predicate and there is no need to introduce further changes or
> rules.
> > The latter implies that the CustomFilter does not have a RexNode
> condition,
> > which further means that it cannot inherit from the existing Filter
> class.
> >
> > I was wondering if any of the above approaches seems reasonable enough
> and
> > if there are other better alternatives that I am missing. Suggestions and
> > comments are very welcomed.
>
>

Re: Creating filter expressions with java predicates

Posted by Julian Hyde <jh...@apache.org>.
Regardless of how you create it, it’s difficult to pass arbitrary objects into a plan. It you can ensure that each predicate has a public no-arguments constructor, you could pass the predicate’s class name. Then your custom operator can instantiate the predicate.

One option is to create a user-defined table function. One of its arguments will be a cursor (the input relational expression) and other arguments will be the names of the predicate classes. Its output is a cursor. 

There is currently no method in RelBuilder to add a table function scan (see https://issues.apache.org/jira/browse/CALCITE-1515 <https://issues.apache.org/jira/browse/CALCITE-1515>) but you can create one manually:

  RelBuilder relBuilder;
  …
  List<RelNode> inputs = ImmutableList.of(relBuilder.build());
  relBuilder.push(new TableFunctionScan(…, inputs, …));

Because of RelBuilder’s stack model, you can easily mix RelNodes that it creates with RelNodes you create manually.

Julian


> On Jun 21, 2018, at 2:06 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
> 
> Hi all,
> 
> I am trying to replace pieces of an old query execution framework with
> Calcite. Consider for example the following very simplified representation
> of such Query in this framework.
> 
> class Query
> {
> private final String pathToTableData;
> private final Predicate<Record> pred1;
> private final Predicate<Record> pred2;
> }
> 
> Basically, it corresponds to a TableScan with Filter(s) so I would like to
> map it as such to Calcite.
> 
> *Approach A*
> The first thing that came to my mind is to use the RelBuilder and
> RexBuilder classes to do so. Then when I am about to create a filter, I am
> not sure how to create the respective row expression that is able to
> describe the above predicates. At the moment, I have the following ideas:
> 
>   1. One possibility would be to create RexCall expression with a custom
>   SqlOperator that takes the java predicate among its parameters.
>   2. Another possibility would be to create RexCall expression with a
>   SqlUserDefinedFunction that already accepts as an input a Function object
>   and there I could pass a custom Function object that contains the java
>   Predicate.
> 
> Then during query planning, it is necessary to add an appropriate rule that
> goes over the condition of the filter examine the SqlOperator and introduce
> a CustomFilter expression for handling this call.
> 
> *Approach B*
> The second alternative would be to not to use at all the RelBuilder and
> build the plan manually with custom relational expressions (subclasses of
> RelNode). As such the CustomFilter expression could take as an argument a
> java predicate and there is no need to introduce further changes or rules.
> The latter implies that the CustomFilter does not have a RexNode condition,
> which further means that it cannot inherit from the existing Filter class.
> 
> I was wondering if any of the above approaches seems reasonable enough and
> if there are other better alternatives that I am missing. Suggestions and
> comments are very welcomed.