You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Haisheng Yuan <hy...@apache.org> on 2020/07/20 01:00:22 UTC

[DISCUSS] New RexNode: RexListCmp

Hi all,

This is a rough idea, I'd like to see how the community think about it.

RexListCmp extends RexNode / RexCall {
  public final SqlOperator op;
  public final RexNode left;
  public final ImmutableList<RexNode> list;
  public final RexQuantifier quantifier;
  public final RelDataType type;
}

Enum RexQuantifier {
  ALL,
  ANY
}

Background:

It is not uncommon that the query contains large number of constant IN list, e.g.
1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, ...., 10000);
2) SELECT * FROM bar WHERE b IN (1, 2, 3, ...., 10000);

Currently, Calcite either translates it into a Join, or expand to OR/AND, which is inefficient, and may cause problems.

With RexListCmp, the predicate in query 1) will be represented as:
RexListCmp {
  op = "<>",
  left = "a"
  list = "1,2,3...10000"
  quantifier = "ALL"
}

The predicate in query 2) will be represented as:
RexListCmp {
  op = "=",
  left = "b"
  list = "1,2,3...10000"
  quantifier = "ANY"
}

It may also be used to represent the predicate in the following query:

SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... (1000, 1000));

Further more, it is extensible. The op is not limited to be equals or not equals, it also be >, <, >=, <=, IDF, INDF or even customized sql operator like geospatial operator intersect:
boolean &&( geometry A , geometry B )

Thoughts?

Thanks,
Haisheng Yuan



Re: [DISCUSS] New RexNode: RexListCmp

Posted by Scott Reynolds <sd...@gmail.com>.
I wanted to reply and share our recent requirement for handling SQL like
the following `error_code IN (3002, 3030)' and the challenges we
faced. For our implementation on top of Apache Kudu, each disjunction
creates a `Scanner' – a resource we need to limit as it represents a
denial of service attack vector (e.g. too many scanners, heap fills
up). Good news for us is Kudu ships with an [`inListPredicate'] and we
expected a plan to include the `SqlKind.IN' as the function which we
could translate into `inListPredicate'. We were surprised when it didn't
do that. We did eventually make this work for our customers with a hack
below but it is not valid plan – for instance unparsing the plan
produces invalid SQL query – and therefore is brittle (but *works*
:fingers-crossed:) .

┌────
│ // This is not the correct use of Array.
│ final RelDataType listType =
builder.getTypeFactory().createArrayType(fieldType, -1);
│ return builder.call(SqlStdOperatorTable.IN,
│          builder.field(conditionTableName, columnName),
│          rexBuilder.makeLiteral(resultValue, listType, true));
└────
We filed a ticket to do it the correct way, which is to take all the
disjunctions, and "un-parse" them into `inListPredicate' calls *if
possible*. This struck us as pretty dense code *but* would apply to
other disjunctions.

It would be *great* if Calcite shipped with a `RexCall' that our
implementation could translate with little effort into a
`inListPredicate'.


[`inListPredicate']
https://kudu.apache.org/apidocs/org/apache/kudu/client/KuduPredicate.html#newInListPredicate-org.apache.kudu.ColumnSchema-java.util.List-

On Mon, Jul 20, 2020 at 3:09 PM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Another quick thought as far as it concerns the IN operator would be to use
> RexCall as it is right now where the first operand in the list is a
> RexInputRef for instance and the rest are the literals.
> I assume that taking this direction would need to change a bit the
> respective SqlOperator.
>
> I haven't thought of this thoroughly so maybe there are important things
> that I am missing.
>
> Best,
> Stamatis
>
>
> On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde <jh...@apache.org> wrote:
>
> > The name isn't very intuitive.
> >
> > The concept of a list and a comparison operator seems OK. As Vladimir
> > points out, it is somewhat similar to RexSubQuery, so maybe this could
> > be a sub-class (but organizing the data a bit more efficiently).
> >
> > I would be very wary of null semantics. RexNode scalar operators are
> > forced to do 3-valued logic, but this is almost a relational operator
> > and it would be better without that burden.
> >
> > Julian
> >
> >
> >
> > On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
> > <si...@gmail.com> wrote:
> > >
> > > >Do you know what is the impact on Enumerable implementation?
> > >
> > > I guess there are plenty of options there.
> > >
> > > The key question regarding RexListCmp is as we introduce a new Rex
> node,
> > > all the planning rules and all engines
> > > must support it somehow.
> > >
> > > Technically speaking, we have RexSubQuery.
> > > Haisheng, have you considered an option to stick with RexSubQuery to
> > avoid
> > > having two more-or-less the same rex classes?
> > >
> > > Vladimir
> >
>

Re: [DISCUSS] New RexNode: RexListCmp

Posted by Stamatis Zampetakis <za...@gmail.com>.
Another quick thought as far as it concerns the IN operator would be to use
RexCall as it is right now where the first operand in the list is a
RexInputRef for instance and the rest are the literals.
I assume that taking this direction would need to change a bit the
respective SqlOperator.

I haven't thought of this thoroughly so maybe there are important things
that I am missing.

Best,
Stamatis


On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde <jh...@apache.org> wrote:

> The name isn't very intuitive.
>
> The concept of a list and a comparison operator seems OK. As Vladimir
> points out, it is somewhat similar to RexSubQuery, so maybe this could
> be a sub-class (but organizing the data a bit more efficiently).
>
> I would be very wary of null semantics. RexNode scalar operators are
> forced to do 3-valued logic, but this is almost a relational operator
> and it would be better without that burden.
>
> Julian
>
>
>
> On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
> <si...@gmail.com> wrote:
> >
> > >Do you know what is the impact on Enumerable implementation?
> >
> > I guess there are plenty of options there.
> >
> > The key question regarding RexListCmp is as we introduce a new Rex node,
> > all the planning rules and all engines
> > must support it somehow.
> >
> > Technically speaking, we have RexSubQuery.
> > Haisheng, have you considered an option to stick with RexSubQuery to
> avoid
> > having two more-or-less the same rex classes?
> >
> > Vladimir
>

Re: [DISCUSS] New RexNode: RexListCmp

Posted by Julian Hyde <jh...@apache.org>.
The name isn't very intuitive.

The concept of a list and a comparison operator seems OK. As Vladimir
points out, it is somewhat similar to RexSubQuery, so maybe this could
be a sub-class (but organizing the data a bit more efficiently).

I would be very wary of null semantics. RexNode scalar operators are
forced to do 3-valued logic, but this is almost a relational operator
and it would be better without that burden.

Julian



On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
<si...@gmail.com> wrote:
>
> >Do you know what is the impact on Enumerable implementation?
>
> I guess there are plenty of options there.
>
> The key question regarding RexListCmp is as we introduce a new Rex node,
> all the planning rules and all engines
> must support it somehow.
>
> Technically speaking, we have RexSubQuery.
> Haisheng, have you considered an option to stick with RexSubQuery to avoid
> having two more-or-less the same rex classes?
>
> Vladimir

Re: [DISCUSS] New RexNode: RexListCmp

Posted by Vladimir Sitnikov <si...@gmail.com>.
>Do you know what is the impact on Enumerable implementation?

I guess there are plenty of options there.

The key question regarding RexListCmp is as we introduce a new Rex node,
all the planning rules and all engines
must support it somehow.

Technically speaking, we have RexSubQuery.
Haisheng, have you considered an option to stick with RexSubQuery to avoid
having two more-or-less the same rex classes?

Vladimir

Re: [DISCUSS] New RexNode: RexListCmp

Posted by Rui Wang <am...@apache.org>.
Hi Haisheng,

Do you know what is the impact on Enumerable implementation? Will
RexListCmp still be converted AND/OR something else in that layer or there
will be new logic introduced to generate code for RexListCmp?

-Rui

On Sun, Jul 19, 2020 at 10:13 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> Il Lun 20 Lug 2020, 03:00 Haisheng Yuan <hy...@apache.org> ha scritto:
>
> > Hi all,
> >
> > This is a rough idea, I'd like to see how the community think about it.
> >
> > RexListCmp extends RexNode / RexCall {
> >   public final SqlOperator op;
> >   public final RexNode left;
> >   public final ImmutableList<RexNode> list;
> >   public final RexQuantifier quantifier;
> >   public final RelDataType type;
> > }
> >
> > Enum RexQuantifier {
> >   ALL,
> >   ANY
> > }
> >
> > Background:
> >
> > It is not uncommon that the query contains large number of constant IN
> > list, e.g.
> > 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, ...., 10000);
> > 2) SELECT * FROM bar WHERE b IN (1, 2, 3, ...., 10000);
> >
> > Currently, Calcite either translates it into a Join, or expand to OR/AND,
> > which is inefficient, and may cause problems.
> >
>
> Yes. It is not efficient.
>
> I would love to see this new feature
>
> Thanks
> Enrico
>
>
>
>
> > With RexListCmp, the predicate in query 1) will be represented as:
> > RexListCmp {
> >   op = "<>",
> >   left = "a"
> >   list = "1,2,3...10000"
> >   quantifier = "ALL"
> > }
> >
> > The predicate in query 2) will be represented as:
> > RexListCmp {
> >   op = "=",
> >   left = "b"
> >   list = "1,2,3...10000"
> >   quantifier = "ANY"
> > }
> >
> > It may also be used to represent the predicate in the following query:
> >
> > SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ...
> (1000,
> > 1000));
> >
> > Further more, it is extensible. The op is not limited to be equals or not
> > equals, it also be >, <, >=, <=, IDF, INDF or even customized sql
> operator
> > like geospatial operator intersect:
> > boolean &&( geometry A , geometry B )
> >
> > Thoughts?
> >
> > Thanks,
> > Haisheng Yuan
> >
> >
> >
>

Re: [DISCUSS] New RexNode: RexListCmp

Posted by Enrico Olivelli <eo...@gmail.com>.
Il Lun 20 Lug 2020, 03:00 Haisheng Yuan <hy...@apache.org> ha scritto:

> Hi all,
>
> This is a rough idea, I'd like to see how the community think about it.
>
> RexListCmp extends RexNode / RexCall {
>   public final SqlOperator op;
>   public final RexNode left;
>   public final ImmutableList<RexNode> list;
>   public final RexQuantifier quantifier;
>   public final RelDataType type;
> }
>
> Enum RexQuantifier {
>   ALL,
>   ANY
> }
>
> Background:
>
> It is not uncommon that the query contains large number of constant IN
> list, e.g.
> 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, ...., 10000);
> 2) SELECT * FROM bar WHERE b IN (1, 2, 3, ...., 10000);
>
> Currently, Calcite either translates it into a Join, or expand to OR/AND,
> which is inefficient, and may cause problems.
>

Yes. It is not efficient.

I would love to see this new feature

Thanks
Enrico




> With RexListCmp, the predicate in query 1) will be represented as:
> RexListCmp {
>   op = "<>",
>   left = "a"
>   list = "1,2,3...10000"
>   quantifier = "ALL"
> }
>
> The predicate in query 2) will be represented as:
> RexListCmp {
>   op = "=",
>   left = "b"
>   list = "1,2,3...10000"
>   quantifier = "ANY"
> }
>
> It may also be used to represent the predicate in the following query:
>
> SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... (1000,
> 1000));
>
> Further more, it is extensible. The op is not limited to be equals or not
> equals, it also be >, <, >=, <=, IDF, INDF or even customized sql operator
> like geospatial operator intersect:
> boolean &&( geometry A , geometry B )
>
> Thoughts?
>
> Thanks,
> Haisheng Yuan
>
>
>