You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2019/10/07 20:53:11 UTC

Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

In https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209 <https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209> I floated the idea of a “$HARD_IN” internal function that has the same semantics as IN but is not expanded to ‘… = OR … = …’.

I think it would be a useful tool, if used judiciously. 

Julian


> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com> wrote:
> 
> As a workaround, you can modify you SqlRexConverlet, create a RexCall with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> Because every OR RexCall has exactly 2 operands, it won't transform into SqlCall with left deep tree.
> 
> Let me know it works for you or not.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Haisheng Yuan<h....@alibaba-inc.com>
> 日 期:2019年10月05日 07:37:04
> 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<de...@calcite.apache.org>
> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> If you want to push the filter down to the source SQL sytem, then transforming to a join won't help you either.
> 
> The reason of stackoverflow for large ORs is the left deep binary tree, we need to change it to balanced binary tree, to reduce the depth of the call.
> 
> I will open a pull request later.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> 日 期:2019年10月04日 21:32:25
> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> Zoltan,
> 
> Thanks for the suggestion. I actually tried doing a UDF first, and it was also successful, sorry for not sharing those details earlier.
> The problem with the UDF is that the predicates are not pushed down to the source SQL system (by design), and this can result in a 100x increase in the amount of data returned from the database. This data will be correctly filtered by the UDF, but returning 100x the data makes it a lot slower. So I was trying to push it down to the source server instead.
> 
> What do you mean by, "I guess Calcite might probably won't be able to do much with these ORs anyway..."? From my experiments I've seen two results from passing in this many OR's:
> 
> - If no other predicates are included in the query, then Calcite succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> - If additional predicates are included, then Calcite nests the OR statements, leading to a stackoverflow for very large OR's, which is CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4) 
> 
> Thanks,
> Peter
> 
> -----Original Message-----
> From: Zoltan Haindrich <ki...@rxd.hu> 
> Sent: Friday, October 4, 2019 12:38 AM
> To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>; Peter Wicks (pwicks) <pw...@micron.com>
> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> 
> I think you might try another approach: introduce some UDF and use your translation logic to call that - as the UDF will be opaque for calcite it will be left alone.
> I guess Calcite might probably won't be able to do much with these ORs anyway...
> 
> 
> On 10/3/19 11:26 PM, Haisheng Yuan wrote:
>> I don't think this can be done in SqlRexConvertlet, which converts SqlNode to RexNode.
>> You might need to massage the SqlToRelConverter to create the RelNode that you want.
>> 
>> BTW, I still think we need RexNode for IN/ANY.
> I also feel that there is some need for IN nodes; but there are some good sides of not having it as well: like simplification handles them better.
> 
> 
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
>> 日 期:2019年10月04日 04:03:51
>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Haisheng,
>> 
>> Yes, that is what I would like to do. Unfortunately, I’m not sure how to proceed to actually do that. I was hoping for a pointer to an example that is similar?
>> 
>> Thanks!
>> Peter
>> 
>> From: Haisheng Yuan <h....@alibaba-inc.com>
>> Sent: Thursday, October 3, 2019 1:35 PM
>> To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, unfortunately.
>> 
>> You can create a Values node with these authorization data, and make a semi join with the table and Values you created.
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
>> 日 期:2019年10月04日 02:34:02
>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>>
>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> A little detail about what I'm trying to do:
>> 
>> I have an external API that contains authorization information on a per user basis. I want users to be able to include an operation in their query that will filter data based on this authorization data.
>> 
>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that implements SqlRexConvertlet, and I am able to get this working. The user includes in their predicate statement `custom_authorize(column)`, my convertlet queries the API, gets the authorization rules, builds an OR statement, and the results come back. This works sometimes, but other times the OR condition becomes too large, and I run into CALCITE-2792: https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0, which causes a stackoverflow and my query dies.
>> 
>> So I tried converting to an IN statement, having read that IN statements are automatically converted to a sub query join when the default limit of 20 is exceeded. The problem is that this appears only to be true for IN statements that are included in the initial query. IN statements created as the result of a convertlet do not get modified, and are sent as an IN statement, which results in a failure to parse the query. I looked at how Calcite normally does this translation from IN to exists using a join, but it depends on a lot of classes/instances that aren't available in the SqlRexContext space. Is it possible to rewrite my IN statement to a join/exists query like Calcite normally does?
>> 
>> Also, am I doing things all wrong? Is there a better way to go about this?
>> 
>> Code Sample below is for the OR version, the commented code can be swapped in to see how I was building the IN statement.
>> 
>> @Override
>> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
>> HashSet<String> keyList = null;
>> try {
>> keyList = new Manager().getAllowedIDs(getContextInformation().getQueryUser());
>> } catch (SQLException e) {
>> e.printStackTrace();
>> }
>> 
>> final RexBuilder rexBuilder = cx.getRexBuilder();
>> final RexNode column = cx.convertExpression(call.operand(0));
>> 
>> final List<RexNode> nodes = new ArrayList<>();
>> for(String s: keyList) {
>> nodes.add(rexBuilder.makeCall(EQUALS, column, rexBuilder.makeLiteral(s)));
>> //nodes.add(rexBuilder.makeLiteral(s));
>> }
>> 
>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new RexNode[0]));
>> 
>> return in;
>> }
>> 
>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, RexNode... nodes) {
>> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
>> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
>> }
>> 
>> 
>> Thanks,
>> Peter
>> 
> 
> 


Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

Posted by Julian Hyde <jh...@apache.org>.
There’s not very much difference between supporting IN in RexCalls vs. supporting $HARD_IN in RexCalls. The distinction is important to me because I think it’s important to have a “core” set of operators, and I don’t want to add IN to that core.  Stamatis feels differently, and I respect that.

One thing that makes me thing that IN does not belong in “core” is that there is a generalization range-sets. With IN we can represent “x = 1 or x = 3 or x = 5”, but with range sets we can also represent “x = 1 or x between 3 and 5 or x> 100”. We make heavy use of range sets in DateRangeRules.

Julian


> On Oct 8, 2019, at 2:47 PM, Xiening Dai <xn...@gmail.com> wrote:
> 
> In my opinion, we will need both - supporting IN operation (either through an operator or an internal function) and adding support for building a balanced tree. It’s always good to be resilient and capable of handling edge cases. The IN support might require more work. Haisheng’s proposal is a practical solution to current issue.
> 
>> On Oct 8, 2019, at 11:06 AM, Haisheng Yuan <h....@alibaba-inc.com> wrote:
>> 
>> Adding IN RexNode only partially solves the problem, as it is still masking the underlying issue. The fundamental reason for the stack overflow iies in the left-deep binary tree. For queries that have tens of thousands of OR condition, but not equals, which is not uncommon in our case, e.g.
>> (a like '...') or (b like '...') or (c like '..')
>> there will still be stack overflow. 
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Stamatis Zampetakis<za...@gmail.com>
>> 日 期:2019年10月08日 15:09:01
>> 收件人:<de...@calcite.apache.org>
>> 主 题:Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> It might be better to add a proper IN operator in RexCalls instead of
>> something internal that does more or less the same thing.
>> It is true that adds more paths in the code and thus requires some
>> additional dev and further support but I think it is worth it.
>> Many people so far expressed an interest to work on various cases involving
>> an IN operator so it might not be long before
>> we have full support for the IN operator.
>> 
>> SqlToRelConverter can still decide to expand or not based on some criterion
>> or property.
>> 
>> 
>> On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jh...@apache.org> wrote:
>> 
>>> A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
>>> $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
>>> $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
>>> IN (v1, v2)".
>>> 
>>> At any point in this lifecycle, you could intercept and and simplify.
>>> 
>>> On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h....@alibaba-inc.com>
>>> wrote:
>>>> 
>>>> Will the filter condition with “$HARD_IN” internal function be able to
>>> pushed down and be recognized by the source SQL system, like Peter
>>> mentioned?
>>>> 
>>>> If not, we have to translate the internal function back to IN during
>>> Rel2Sql phase. Otherwise, the data read from the source table can be much
>>> larger.
>>>> 
>>>> - Haisheng
>>>> 
>>>> ------------------------------------------------------------------
>>>> 发件人:Julian Hyde<jh...@apache.org>
>>>> 日 期:2019年10月08日 04:53:11
>>>> 收件人:dev<de...@calcite.apache.org>
>>>> 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
>>>> 
>>>> In
>>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
>>> <
>>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
>>> I floated the idea of a “$HARD_IN” internal function that has the same
>>> semantics as IN but is not expanded to ‘… = OR … = …’.
>>>> 
>>>> I think it would be a useful tool, if used judiciously.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com>
>>> wrote:
>>>>> 
>>>>> As a workaround, you can modify you SqlRexConverlet, create a RexCall
>>> with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a
>>> flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
>>>>> Because every OR RexCall has exactly 2 operands, it won't transform
>>> into SqlCall with left deep tree.
>>>>> 
>>>>> Let me know it works for you or not.
>>>>> 
>>>>> - Haisheng
>>>>> 
>>>>> ------------------------------------------------------------------
>>>>> 发件人:Haisheng Yuan<h....@alibaba-inc.com>
>>>>> 日 期:2019年10月05日 07:37:04
>>>>> 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<
>>> dev@calcite.apache.org>
>>>>> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>> 
>>>>> If you want to push the filter down to the source SQL sytem, then
>>> transforming to a join won't help you either.
>>>>> 
>>>>> The reason of stackoverflow for large ORs is the left deep binary
>>> tree, we need to change it to balanced binary tree, to reduce the depth of
>>> the call.
>>>>> 
>>>>> I will open a pull request later.
>>>>> 
>>>>> - Haisheng
>>>>> 
>>>>> ------------------------------------------------------------------
>>>>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
>>>>> 日 期:2019年10月04日 21:32:25
>>>>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
>>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>> 
>>>>> Zoltan,
>>>>> 
>>>>> Thanks for the suggestion. I actually tried doing a UDF first, and it
>>> was also successful, sorry for not sharing those details earlier.
>>>>> The problem with the UDF is that the predicates are not pushed down to
>>> the source SQL system (by design), and this can result in a 100x increase
>>> in the amount of data returned from the database. This data will be
>>> correctly filtered by the UDF, but returning 100x the data makes it a lot
>>> slower. So I was trying to push it down to the source server instead.
>>>>> 
>>>>> What do you mean by, "I guess Calcite might probably won't be able to
>>> do much with these ORs anyway..."? From my experiments I've seen two
>>> results from passing in this many OR's:
>>>>> 
>>>>> - If no other predicates are included in the query, then Calcite
>>> succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
>>>>> - If additional predicates are included, then Calcite nests the OR
>>> statements, leading to a stackoverflow for very large OR's, which is
>>> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
>>>>> 
>>>>> Thanks,
>>>>> Peter
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Zoltan Haindrich <ki...@rxd.hu>
>>>>> Sent: Friday, October 4, 2019 12:38 AM
>>>>> To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>;
>>> Peter Wicks (pwicks) <pw...@micron.com>
>>>>> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN"
>>> Conversion Logic
>>>>> 
>>>>> 
>>>>> I think you might try another approach: introduce some UDF and use
>>> your translation logic to call that - as the UDF will be opaque for calcite
>>> it will be left alone.
>>>>> I guess Calcite might probably won't be able to do much with these ORs
>>> anyway...
>>>>> 
>>>>> 
>>>>> On 10/3/19 11:26 PM, Haisheng Yuan wrote:
>>>>>> I don't think this can be done in SqlRexConvertlet, which converts
>>> SqlNode to RexNode.
>>>>>> You might need to massage the SqlToRelConverter to create the RelNode
>>> that you want.
>>>>>> 
>>>>>> BTW, I still think we need RexNode for IN/ANY.
>>>>> I also feel that there is some need for IN nodes; but there are some
>>> good sides of not having it as well: like simplification handles them
>>> better.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> - Haisheng
>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
>>>>>> 日 期:2019年10月04日 04:03:51
>>>>>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
>>>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>>> 
>>>>>> Haisheng,
>>>>>> 
>>>>>> Yes, that is what I would like to do. Unfortunately, I’m not sure how
>>> to proceed to actually do that. I was hoping for a pointer to an example
>>> that is similar?
>>>>>> 
>>>>>> Thanks!
>>>>>> Peter
>>>>>> 
>>>>>> From: Haisheng Yuan <h....@alibaba-inc.com>
>>>>>> Sent: Thursday, October 3, 2019 1:35 PM
>>>>>> To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
>>>>>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>>> Logic
>>>>>> 
>>>>>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode,
>>> unfortunately.
>>>>>> 
>>>>>> You can create a Values node with these authorization data, and make
>>> a semi join with the table and Values you created.
>>>>>> 
>>>>>> - Haisheng
>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
>>>>>> 日 期:2019年10月04日 02:34:02
>>>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:
>>> dev@calcite.apache.org%3cdev@calcite.apache.org>>
>>>>>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
>>>>>> 
>>>>>> A little detail about what I'm trying to do:
>>>>>> 
>>>>>> I have an external API that contains authorization information on a
>>> per user basis. I want users to be able to include an operation in their
>>> query that will filter data based on this authorization data.
>>>>>> 
>>>>>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that
>>> implements SqlRexConvertlet, and I am able to get this working. The user
>>> includes in their predicate statement `custom_authorize(column)`, my
>>> convertlet queries the API, gets the authorization rules, builds an OR
>>> statement, and the results come back. This works sometimes, but other times
>>> the OR condition becomes too large, and I run into CALCITE-2792:
>>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
>>> which causes a stackoverflow and my query dies.
>>>>>> 
>>>>>> So I tried converting to an IN statement, having read that IN
>>> statements are automatically converted to a sub query join when the default
>>> limit of 20 is exceeded. The problem is that this appears only to be true
>>> for IN statements that are included in the initial query. IN statements
>>> created as the result of a convertlet do not get modified, and are sent as
>>> an IN statement, which results in a failure to parse the query. I looked at
>>> how Calcite normally does this translation from IN to exists using a join,
>>> but it depends on a lot of classes/instances that aren't available in the
>>> SqlRexContext space. Is it possible to rewrite my IN statement to a
>>> join/exists query like Calcite normally does?
>>>>>> 
>>>>>> Also, am I doing things all wrong? Is there a better way to go about
>>> this?
>>>>>> 
>>>>>> Code Sample below is for the OR version, the commented code can be
>>> swapped in to see how I was building the IN statement.
>>>>>> 
>>>>>> @Override
>>>>>> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
>>>>>> HashSet<String> keyList = null;
>>>>>> try {
>>>>>> keyList = new
>>> Manager().getAllowedIDs(getContextInformation().getQueryUser());
>>>>>> } catch (SQLException e) {
>>>>>> e.printStackTrace();
>>>>>> }
>>>>>> 
>>>>>> final RexBuilder rexBuilder = cx.getRexBuilder();
>>>>>> final RexNode column = cx.convertExpression(call.operand(0));
>>>>>> 
>>>>>> final List<RexNode> nodes = new ArrayList<>();
>>>>>> for(String s: keyList) {
>>>>>> nodes.add(rexBuilder.makeCall(EQUALS, column,
>>> rexBuilder.makeLiteral(s)));
>>>>>> //nodes.add(rexBuilder.makeLiteral(s));
>>>>>> }
>>>>>> 
>>>>>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
>>>>>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new
>>> RexNode[0]));
>>>>>> 
>>>>>> return in;
>>>>>> }
>>>>>> 
>>>>>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node,
>>> RexNode... nodes) {
>>>>>> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
>>>>>> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Peter
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

Posted by Xiening Dai <xn...@gmail.com>.
In my opinion, we will need both - supporting IN operation (either through an operator or an internal function) and adding support for building a balanced tree. It’s always good to be resilient and capable of handling edge cases. The IN support might require more work. Haisheng’s proposal is a practical solution to current issue.

> On Oct 8, 2019, at 11:06 AM, Haisheng Yuan <h....@alibaba-inc.com> wrote:
> 
> Adding IN RexNode only partially solves the problem, as it is still masking the underlying issue. The fundamental reason for the stack overflow iies in the left-deep binary tree. For queries that have tens of thousands of OR condition, but not equals, which is not uncommon in our case, e.g.
> (a like '...') or (b like '...') or (c like '..')
> there will still be stack overflow. 
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Stamatis Zampetakis<za...@gmail.com>
> 日 期:2019年10月08日 15:09:01
> 收件人:<de...@calcite.apache.org>
> 主 题:Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> It might be better to add a proper IN operator in RexCalls instead of
> something internal that does more or less the same thing.
> It is true that adds more paths in the code and thus requires some
> additional dev and further support but I think it is worth it.
> Many people so far expressed an interest to work on various cases involving
> an IN operator so it might not be long before
> we have full support for the IN operator.
> 
> SqlToRelConverter can still decide to expand or not based on some criterion
> or property.
> 
> 
> On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jh...@apache.org> wrote:
> 
>> A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
>> $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
>> $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
>> IN (v1, v2)".
>> 
>> At any point in this lifecycle, you could intercept and and simplify.
>> 
>> On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h....@alibaba-inc.com>
>> wrote:
>>> 
>>> Will the filter condition with “$HARD_IN” internal function be able to
>> pushed down and be recognized by the source SQL system, like Peter
>> mentioned?
>>> 
>>> If not, we have to translate the internal function back to IN during
>> Rel2Sql phase. Otherwise, the data read from the source table can be much
>> larger.
>>> 
>>> - Haisheng
>>> 
>>> ------------------------------------------------------------------
>>> 发件人:Julian Hyde<jh...@apache.org>
>>> 日 期:2019年10月08日 04:53:11
>>> 收件人:dev<de...@calcite.apache.org>
>>> 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
>>> 
>>> In
>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
>> <
>> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
>> I floated the idea of a “$HARD_IN” internal function that has the same
>> semantics as IN but is not expanded to ‘… = OR … = …’.
>>> 
>>> I think it would be a useful tool, if used judiciously.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com>
>> wrote:
>>>> 
>>>> As a workaround, you can modify you SqlRexConverlet, create a RexCall
>> with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a
>> flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
>>>> Because every OR RexCall has exactly 2 operands, it won't transform
>> into SqlCall with left deep tree.
>>>> 
>>>> Let me know it works for you or not.
>>>> 
>>>> - Haisheng
>>>> 
>>>> ------------------------------------------------------------------
>>>> 发件人:Haisheng Yuan<h....@alibaba-inc.com>
>>>> 日 期:2019年10月05日 07:37:04
>>>> 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<
>> dev@calcite.apache.org>
>>>> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>> Logic
>>>> 
>>>> If you want to push the filter down to the source SQL sytem, then
>> transforming to a join won't help you either.
>>>> 
>>>> The reason of stackoverflow for large ORs is the left deep binary
>> tree, we need to change it to balanced binary tree, to reduce the depth of
>> the call.
>>>> 
>>>> I will open a pull request later.
>>>> 
>>>> - Haisheng
>>>> 
>>>> ------------------------------------------------------------------
>>>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
>>>> 日 期:2019年10月04日 21:32:25
>>>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>> Logic
>>>> 
>>>> Zoltan,
>>>> 
>>>> Thanks for the suggestion. I actually tried doing a UDF first, and it
>> was also successful, sorry for not sharing those details earlier.
>>>> The problem with the UDF is that the predicates are not pushed down to
>> the source SQL system (by design), and this can result in a 100x increase
>> in the amount of data returned from the database. This data will be
>> correctly filtered by the UDF, but returning 100x the data makes it a lot
>> slower. So I was trying to push it down to the source server instead.
>>>> 
>>>> What do you mean by, "I guess Calcite might probably won't be able to
>> do much with these ORs anyway..."? From my experiments I've seen two
>> results from passing in this many OR's:
>>>> 
>>>> - If no other predicates are included in the query, then Calcite
>> succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
>>>> - If additional predicates are included, then Calcite nests the OR
>> statements, leading to a stackoverflow for very large OR's, which is
>> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
>>>> 
>>>> Thanks,
>>>> Peter
>>>> 
>>>> -----Original Message-----
>>>> From: Zoltan Haindrich <ki...@rxd.hu>
>>>> Sent: Friday, October 4, 2019 12:38 AM
>>>> To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>;
>> Peter Wicks (pwicks) <pw...@micron.com>
>>>> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN"
>> Conversion Logic
>>>> 
>>>> 
>>>> I think you might try another approach: introduce some UDF and use
>> your translation logic to call that - as the UDF will be opaque for calcite
>> it will be left alone.
>>>> I guess Calcite might probably won't be able to do much with these ORs
>> anyway...
>>>> 
>>>> 
>>>> On 10/3/19 11:26 PM, Haisheng Yuan wrote:
>>>>> I don't think this can be done in SqlRexConvertlet, which converts
>> SqlNode to RexNode.
>>>>> You might need to massage the SqlToRelConverter to create the RelNode
>> that you want.
>>>>> 
>>>>> BTW, I still think we need RexNode for IN/ANY.
>>>> I also feel that there is some need for IN nodes; but there are some
>> good sides of not having it as well: like simplification handles them
>> better.
>>>> 
>>>> 
>>>>> 
>>>>> - Haisheng
>>>>> 
>>>>> ------------------------------------------------------------------
>>>>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
>>>>> 日 期:2019年10月04日 04:03:51
>>>>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
>>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>> Logic
>>>>> 
>>>>> Haisheng,
>>>>> 
>>>>> Yes, that is what I would like to do. Unfortunately, I’m not sure how
>> to proceed to actually do that. I was hoping for a pointer to an example
>> that is similar?
>>>>> 
>>>>> Thanks!
>>>>> Peter
>>>>> 
>>>>> From: Haisheng Yuan <h....@alibaba-inc.com>
>>>>> Sent: Thursday, October 3, 2019 1:35 PM
>>>>> To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
>>>>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
>> Logic
>>>>> 
>>>>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode,
>> unfortunately.
>>>>> 
>>>>> You can create a Values node with these authorization data, and make
>> a semi join with the table and Values you created.
>>>>> 
>>>>> - Haisheng
>>>>> 
>>>>> ------------------------------------------------------------------
>>>>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
>>>>> 日 期:2019年10月04日 02:34:02
>>>>> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:
>> dev@calcite.apache.org%3cdev@calcite.apache.org>>
>>>>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
>>>>> 
>>>>> A little detail about what I'm trying to do:
>>>>> 
>>>>> I have an external API that contains authorization information on a
>> per user basis. I want users to be able to include an operation in their
>> query that will filter data based on this authorization data.
>>>>> 
>>>>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that
>> implements SqlRexConvertlet, and I am able to get this working. The user
>> includes in their predicate statement `custom_authorize(column)`, my
>> convertlet queries the API, gets the authorization rules, builds an OR
>> statement, and the results come back. This works sometimes, but other times
>> the OR condition becomes too large, and I run into CALCITE-2792:
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
>> which causes a stackoverflow and my query dies.
>>>>> 
>>>>> So I tried converting to an IN statement, having read that IN
>> statements are automatically converted to a sub query join when the default
>> limit of 20 is exceeded. The problem is that this appears only to be true
>> for IN statements that are included in the initial query. IN statements
>> created as the result of a convertlet do not get modified, and are sent as
>> an IN statement, which results in a failure to parse the query. I looked at
>> how Calcite normally does this translation from IN to exists using a join,
>> but it depends on a lot of classes/instances that aren't available in the
>> SqlRexContext space. Is it possible to rewrite my IN statement to a
>> join/exists query like Calcite normally does?
>>>>> 
>>>>> Also, am I doing things all wrong? Is there a better way to go about
>> this?
>>>>> 
>>>>> Code Sample below is for the OR version, the commented code can be
>> swapped in to see how I was building the IN statement.
>>>>> 
>>>>> @Override
>>>>> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
>>>>> HashSet<String> keyList = null;
>>>>> try {
>>>>> keyList = new
>> Manager().getAllowedIDs(getContextInformation().getQueryUser());
>>>>> } catch (SQLException e) {
>>>>> e.printStackTrace();
>>>>> }
>>>>> 
>>>>> final RexBuilder rexBuilder = cx.getRexBuilder();
>>>>> final RexNode column = cx.convertExpression(call.operand(0));
>>>>> 
>>>>> final List<RexNode> nodes = new ArrayList<>();
>>>>> for(String s: keyList) {
>>>>> nodes.add(rexBuilder.makeCall(EQUALS, column,
>> rexBuilder.makeLiteral(s)));
>>>>> //nodes.add(rexBuilder.makeLiteral(s));
>>>>> }
>>>>> 
>>>>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
>>>>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new
>> RexNode[0]));
>>>>> 
>>>>> return in;
>>>>> }
>>>>> 
>>>>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node,
>> RexNode... nodes) {
>>>>> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
>>>>> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
>>>>> }
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Peter
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Re: Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

Posted by Haisheng Yuan <h....@alibaba-inc.com>.
Adding IN RexNode only partially solves the problem, as it is still masking the underlying issue. The fundamental reason for the stack overflow iies in the left-deep binary tree. For queries that have tens of thousands of OR condition, but not equals, which is not uncommon in our case, e.g.
(a like '...') or (b like '...') or (c like '..')
there will still be stack overflow. 

- Haisheng

------------------------------------------------------------------
发件人:Stamatis Zampetakis<za...@gmail.com>
日 期:2019年10月08日 15:09:01
收件人:<de...@calcite.apache.org>
主 题:Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

It might be better to add a proper IN operator in RexCalls instead of
something internal that does more or less the same thing.
It is true that adds more paths in the code and thus requires some
additional dev and further support but I think it is worth it.
Many people so far expressed an interest to work on various cases involving
an IN operator so it might not be long before
we have full support for the IN operator.

SqlToRelConverter can still decide to expand or not based on some criterion
or property.


On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jh...@apache.org> wrote:

> A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
> $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
> $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
> IN (v1, v2)".
>
> At any point in this lifecycle, you could intercept and and simplify.
>
> On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
> >
> > Will the filter condition with “$HARD_IN” internal function be able to
> pushed down and be recognized by the source SQL system, like Peter
> mentioned?
> >
> > If not, we have to translate the internal function back to IN during
> Rel2Sql phase. Otherwise, the data read from the source table can be much
> larger.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人:Julian Hyde<jh...@apache.org>
> > 日 期:2019年10月08日 04:53:11
> > 收件人:dev<de...@calcite.apache.org>
> > 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
> >
> > In
> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
> <
> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
> I floated the idea of a “$HARD_IN” internal function that has the same
> semantics as IN but is not expanded to ‘… = OR … = …’.
> >
> > I think it would be a useful tool, if used judiciously.
> >
> > Julian
> >
> >
> > > On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
> > >
> > > As a workaround, you can modify you SqlRexConverlet, create a RexCall
> with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a
> flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> > > Because every OR RexCall has exactly 2 operands, it won't transform
> into SqlCall with left deep tree.
> > >
> > > Let me know it works for you or not.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Haisheng Yuan<h....@alibaba-inc.com>
> > > 日 期:2019年10月05日 07:37:04
> > > 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<
> dev@calcite.apache.org>
> > > 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >
> > > If you want to push the filter down to the source SQL sytem, then
> transforming to a join won't help you either.
> > >
> > > The reason of stackoverflow for large ORs is the left deep binary
> tree, we need to change it to balanced binary tree, to reduce the depth of
> the call.
> > >
> > > I will open a pull request later.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> > > 日 期:2019年10月04日 21:32:25
> > > 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> > > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >
> > > Zoltan,
> > >
> > > Thanks for the suggestion. I actually tried doing a UDF first, and it
> was also successful, sorry for not sharing those details earlier.
> > > The problem with the UDF is that the predicates are not pushed down to
> the source SQL system (by design), and this can result in a 100x increase
> in the amount of data returned from the database. This data will be
> correctly filtered by the UDF, but returning 100x the data makes it a lot
> slower. So I was trying to push it down to the source server instead.
> > >
> > > What do you mean by, "I guess Calcite might probably won't be able to
> do much with these ORs anyway..."? From my experiments I've seen two
> results from passing in this many OR's:
> > >
> > > - If no other predicates are included in the query, then Calcite
> succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> > > - If additional predicates are included, then Calcite nests the OR
> statements, leading to a stackoverflow for very large OR's, which is
> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
> > >
> > > Thanks,
> > > Peter
> > >
> > > -----Original Message-----
> > > From: Zoltan Haindrich <ki...@rxd.hu>
> > > Sent: Friday, October 4, 2019 12:38 AM
> > > To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>;
> Peter Wicks (pwicks) <pw...@micron.com>
> > > Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN"
> Conversion Logic
> > >
> > >
> > > I think you might try another approach: introduce some UDF and use
> your translation logic to call that - as the UDF will be opaque for calcite
> it will be left alone.
> > > I guess Calcite might probably won't be able to do much with these ORs
> anyway...
> > >
> > >
> > > On 10/3/19 11:26 PM, Haisheng Yuan wrote:
> > >> I don't think this can be done in SqlRexConvertlet, which converts
> SqlNode to RexNode.
> > >> You might need to massage the SqlToRelConverter to create the RelNode
> that you want.
> > >>
> > >> BTW, I still think we need RexNode for IN/ANY.
> > > I also feel that there is some need for IN nodes; but there are some
> good sides of not having it as well: like simplification handles them
> better.
> > >
> > >
> > >>
> > >> - Haisheng
> > >>
> > >> ------------------------------------------------------------------
> > >> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> > >> 日 期:2019年10月04日 04:03:51
> > >> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> > >> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >>
> > >> Haisheng,
> > >>
> > >> Yes, that is what I would like to do. Unfortunately, I’m not sure how
> to proceed to actually do that. I was hoping for a pointer to an example
> that is similar?
> > >>
> > >> Thanks!
> > >> Peter
> > >>
> > >> From: Haisheng Yuan <h....@alibaba-inc.com>
> > >> Sent: Thursday, October 3, 2019 1:35 PM
> > >> To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
> > >> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >>
> > >> Currently Calcite doesn't have IN RexNode, only has IN SqlNode,
> unfortunately.
> > >>
> > >> You can create a Values node with these authorization data, and make
> a semi join with the table and Values you created.
> > >>
> > >> - Haisheng
> > >>
> > >> ------------------------------------------------------------------
> > >> 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
> > >> 日 期:2019年10月04日 02:34:02
> > >> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:
> dev@calcite.apache.org%3cdev@calcite.apache.org>>
> > >> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
> > >>
> > >> A little detail about what I'm trying to do:
> > >>
> > >> I have an external API that contains authorization information on a
> per user basis. I want users to be able to include an operation in their
> query that will filter data based on this authorization data.
> > >>
> > >> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that
> implements SqlRexConvertlet, and I am able to get this working. The user
> includes in their predicate statement `custom_authorize(column)`, my
> convertlet queries the API, gets the authorization rules, builds an OR
> statement, and the results come back. This works sometimes, but other times
> the OR condition becomes too large, and I run into CALCITE-2792:
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
> which causes a stackoverflow and my query dies.
> > >>
> > >> So I tried converting to an IN statement, having read that IN
> statements are automatically converted to a sub query join when the default
> limit of 20 is exceeded. The problem is that this appears only to be true
> for IN statements that are included in the initial query. IN statements
> created as the result of a convertlet do not get modified, and are sent as
> an IN statement, which results in a failure to parse the query. I looked at
> how Calcite normally does this translation from IN to exists using a join,
> but it depends on a lot of classes/instances that aren't available in the
> SqlRexContext space. Is it possible to rewrite my IN statement to a
> join/exists query like Calcite normally does?
> > >>
> > >> Also, am I doing things all wrong? Is there a better way to go about
> this?
> > >>
> > >> Code Sample below is for the OR version, the commented code can be
> swapped in to see how I was building the IN statement.
> > >>
> > >> @Override
> > >> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
> > >> HashSet<String> keyList = null;
> > >> try {
> > >> keyList = new
> Manager().getAllowedIDs(getContextInformation().getQueryUser());
> > >> } catch (SQLException e) {
> > >> e.printStackTrace();
> > >> }
> > >>
> > >> final RexBuilder rexBuilder = cx.getRexBuilder();
> > >> final RexNode column = cx.convertExpression(call.operand(0));
> > >>
> > >> final List<RexNode> nodes = new ArrayList<>();
> > >> for(String s: keyList) {
> > >> nodes.add(rexBuilder.makeCall(EQUALS, column,
> rexBuilder.makeLiteral(s)));
> > >> //nodes.add(rexBuilder.makeLiteral(s));
> > >> }
> > >>
> > >> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
> > >> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new
> RexNode[0]));
> > >>
> > >> return in;
> > >> }
> > >>
> > >> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node,
> RexNode... nodes) {
> > >> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
> > >> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
> > >> }
> > >>
> > >>
> > >> Thanks,
> > >> Peter
> > >>
> > >
> > >
> >
> >
>


Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

Posted by Danny Chan <yu...@gmail.com>.
I have the same feeling with Stamatis, we should have a full support for IN operator. Compared with “IS NOT DISTINCT FROM”, I think they have much in common, we have did some work like CALCITE-3174 to collapse the “IS NOT DISTINCT FROM” operator to make the whole operator can be pushed down.

Although we make heavy use of range sets in DateRangeRules instead of the IN operator itself, IN is a standard sql operator, I’m inclined to make it into the core operators.

Best,
Danny Chan
在 2019年10月8日 +0800 PM3:09,Stamatis Zampetakis <za...@gmail.com>,写道:
> It might be better to add a proper IN operator in RexCalls instead of
> something internal that does more or less the same thing.
> It is true that adds more paths in the code and thus requires some
> additional dev and further support but I think it is worth it.
> Many people so far expressed an interest to work on various cases involving
> an IN operator so it might not be long before
> we have full support for the IN operator.
>
> SqlToRelConverter can still decide to expand or not based on some criterion
> or property.
>
>
> On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jh...@apache.org> wrote:
>
> > A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
> > $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
> > $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
> > IN (v1, v2)".
> >
> > At any point in this lifecycle, you could intercept and and simplify.
> >
> > On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h....@alibaba-inc.com>
> > wrote:
> > >
> > > Will the filter condition with “$HARD_IN” internal function be able to
> > pushed down and be recognized by the source SQL system, like Peter
> > mentioned?
> > >
> > > If not, we have to translate the internal function back to IN during
> > Rel2Sql phase. Otherwise, the data read from the source table can be much
> > larger.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Julian Hyde<jh...@apache.org>
> > > 日 期:2019年10月08日 04:53:11
> > > 收件人:dev<de...@calcite.apache.org>
> > > 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
> > >
> > > In
> > https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
> > <
> > https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
> > I floated the idea of a “$HARD_IN” internal function that has the same
> > semantics as IN but is not expanded to ‘… = OR … = …’.
> > >
> > > I think it would be a useful tool, if used judiciously.
> > >
> > > Julian
> > >
> > >
> > > > On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com>
> > wrote:
> > > >
> > > > As a workaround, you can modify you SqlRexConverlet, create a RexCall
> > with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a
> > flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> > > > Because every OR RexCall has exactly 2 operands, it won't transform
> > into SqlCall with left deep tree.
> > > >
> > > > Let me know it works for you or not.
> > > >
> > > > - Haisheng
> > > >
> > > > ------------------------------------------------------------------
> > > > 发件人:Haisheng Yuan<h....@alibaba-inc.com>
> > > > 日 期:2019年10月05日 07:37:04
> > > > 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<
> > dev@calcite.apache.org>
> > > > 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > >
> > > > If you want to push the filter down to the source SQL sytem, then
> > transforming to a join won't help you either.
> > > >
> > > > The reason of stackoverflow for large ORs is the left deep binary
> > tree, we need to change it to balanced binary tree, to reduce the depth of
> > the call.
> > > >
> > > > I will open a pull request later.
> > > >
> > > > - Haisheng
> > > >
> > > > ------------------------------------------------------------------
> > > > 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> > > > 日 期:2019年10月04日 21:32:25
> > > > 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> > > > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > >
> > > > Zoltan,
> > > >
> > > > Thanks for the suggestion. I actually tried doing a UDF first, and it
> > was also successful, sorry for not sharing those details earlier.
> > > > The problem with the UDF is that the predicates are not pushed down to
> > the source SQL system (by design), and this can result in a 100x increase
> > in the amount of data returned from the database. This data will be
> > correctly filtered by the UDF, but returning 100x the data makes it a lot
> > slower. So I was trying to push it down to the source server instead.
> > > >
> > > > What do you mean by, "I guess Calcite might probably won't be able to
> > do much with these ORs anyway..."? From my experiments I've seen two
> > results from passing in this many OR's:
> > > >
> > > > - If no other predicates are included in the query, then Calcite
> > succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> > > > - If additional predicates are included, then Calcite nests the OR
> > statements, leading to a stackoverflow for very large OR's, which is
> > CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > > -----Original Message-----
> > > > From: Zoltan Haindrich <ki...@rxd.hu>
> > > > Sent: Friday, October 4, 2019 12:38 AM
> > > > To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>;
> > Peter Wicks (pwicks) <pw...@micron.com>
> > > > Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN"
> > Conversion Logic
> > > >
> > > >
> > > > I think you might try another approach: introduce some UDF and use
> > your translation logic to call that - as the UDF will be opaque for calcite
> > it will be left alone.
> > > > I guess Calcite might probably won't be able to do much with these ORs
> > anyway...
> > > >
> > > >
> > > > On 10/3/19 11:26 PM, Haisheng Yuan wrote:
> > > > > I don't think this can be done in SqlRexConvertlet, which converts
> > SqlNode to RexNode.
> > > > > You might need to massage the SqlToRelConverter to create the RelNode
> > that you want.
> > > > >
> > > > > BTW, I still think we need RexNode for IN/ANY.
> > > > I also feel that there is some need for IN nodes; but there are some
> > good sides of not having it as well: like simplification handles them
> > better.
> > > >
> > > >
> > > > >
> > > > > - Haisheng
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> > > > > 日 期:2019年10月04日 04:03:51
> > > > > 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> > > > > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > > >
> > > > > Haisheng,
> > > > >
> > > > > Yes, that is what I would like to do. Unfortunately, I’m not sure how
> > to proceed to actually do that. I was hoping for a pointer to an example
> > that is similar?
> > > > >
> > > > > Thanks!
> > > > > Peter
> > > > >
> > > > > From: Haisheng Yuan <h....@alibaba-inc.com>
> > > > > Sent: Thursday, October 3, 2019 1:35 PM
> > > > > To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
> > > > > Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > > >
> > > > > Currently Calcite doesn't have IN RexNode, only has IN SqlNode,
> > unfortunately.
> > > > >
> > > > > You can create a Values node with these authorization data, and make
> > a semi join with the table and Values you created.
> > > > >
> > > > > - Haisheng
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
> > > > > 日 期:2019年10月04日 02:34:02
> > > > > 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:
> > dev@calcite.apache.org%3cdev@calcite.apache.org>>
> > > > > 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
> > > > >
> > > > > A little detail about what I'm trying to do:
> > > > >
> > > > > I have an external API that contains authorization information on a
> > per user basis. I want users to be able to include an operation in their
> > query that will filter data based on this authorization data.
> > > > >
> > > > > Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that
> > implements SqlRexConvertlet, and I am able to get this working. The user
> > includes in their predicate statement `custom_authorize(column)`, my
> > convertlet queries the API, gets the authorization rules, builds an OR
> > statement, and the results come back. This works sometimes, but other times
> > the OR condition becomes too large, and I run into CALCITE-2792:
> > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
> > which causes a stackoverflow and my query dies.
> > > > >
> > > > > So I tried converting to an IN statement, having read that IN
> > statements are automatically converted to a sub query join when the default
> > limit of 20 is exceeded. The problem is that this appears only to be true
> > for IN statements that are included in the initial query. IN statements
> > created as the result of a convertlet do not get modified, and are sent as
> > an IN statement, which results in a failure to parse the query. I looked at
> > how Calcite normally does this translation from IN to exists using a join,
> > but it depends on a lot of classes/instances that aren't available in the
> > SqlRexContext space. Is it possible to rewrite my IN statement to a
> > join/exists query like Calcite normally does?
> > > > >
> > > > > Also, am I doing things all wrong? Is there a better way to go about
> > this?
> > > > >
> > > > > Code Sample below is for the OR version, the commented code can be
> > swapped in to see how I was building the IN statement.
> > > > >
> > > > > @Override
> > > > > public RexNode convertCall(SqlRexContext cx, SqlCall call) {
> > > > > HashSet<String> keyList = null;
> > > > > try {
> > > > > keyList = new
> > Manager().getAllowedIDs(getContextInformation().getQueryUser());
> > > > > } catch (SQLException e) {
> > > > > e.printStackTrace();
> > > > > }
> > > > >
> > > > > final RexBuilder rexBuilder = cx.getRexBuilder();
> > > > > final RexNode column = cx.convertExpression(call.operand(0));
> > > > >
> > > > > final List<RexNode> nodes = new ArrayList<>();
> > > > > for(String s: keyList) {
> > > > > nodes.add(rexBuilder.makeCall(EQUALS, column,
> > rexBuilder.makeLiteral(s)));
> > > > > //nodes.add(rexBuilder.makeLiteral(s));
> > > > > }
> > > > >
> > > > > final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
> > > > > //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new
> > RexNode[0]));
> > > > >
> > > > > return in;
> > > > > }
> > > > >
> > > > > protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node,
> > RexNode... nodes) {
> > > > > return rexBuilder.makeCall(SqlStdOperatorTable.IN,
> > > > > ImmutableList.<RexNode>builder().add(node).add(nodes).build());
> > > > > }
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > >
> > > >
> > >
> > >
> >

Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

Posted by Stamatis Zampetakis <za...@gmail.com>.
It might be better to add a proper IN operator in RexCalls instead of
something internal that does more or less the same thing.
It is true that adds more paths in the code and thus requires some
additional dev and further support but I think it is worth it.
Many people so far expressed an interest to work on various cases involving
an IN operator so it might not be long before
we have full support for the IN operator.

SqlToRelConverter can still decide to expand or not based on some criterion
or property.


On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jh...@apache.org> wrote:

> A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
> $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
> $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
> IN (v1, v2)".
>
> At any point in this lifecycle, you could intercept and and simplify.
>
> On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
> >
> > Will the filter condition with “$HARD_IN” internal function be able to
> pushed down and be recognized by the source SQL system, like Peter
> mentioned?
> >
> > If not, we have to translate the internal function back to IN during
> Rel2Sql phase. Otherwise, the data read from the source table can be much
> larger.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人:Julian Hyde<jh...@apache.org>
> > 日 期:2019年10月08日 04:53:11
> > 收件人:dev<de...@calcite.apache.org>
> > 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
> >
> > In
> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
> <
> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
> I floated the idea of a “$HARD_IN” internal function that has the same
> semantics as IN but is not expanded to ‘… = OR … = …’.
> >
> > I think it would be a useful tool, if used judiciously.
> >
> > Julian
> >
> >
> > > On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com>
> wrote:
> > >
> > > As a workaround, you can modify you SqlRexConverlet, create a RexCall
> with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a
> flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> > > Because every OR RexCall has exactly 2 operands, it won't transform
> into SqlCall with left deep tree.
> > >
> > > Let me know it works for you or not.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Haisheng Yuan<h....@alibaba-inc.com>
> > > 日 期:2019年10月05日 07:37:04
> > > 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<
> dev@calcite.apache.org>
> > > 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >
> > > If you want to push the filter down to the source SQL sytem, then
> transforming to a join won't help you either.
> > >
> > > The reason of stackoverflow for large ORs is the left deep binary
> tree, we need to change it to balanced binary tree, to reduce the depth of
> the call.
> > >
> > > I will open a pull request later.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> > > 日 期:2019年10月04日 21:32:25
> > > 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> > > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >
> > > Zoltan,
> > >
> > > Thanks for the suggestion. I actually tried doing a UDF first, and it
> was also successful, sorry for not sharing those details earlier.
> > > The problem with the UDF is that the predicates are not pushed down to
> the source SQL system (by design), and this can result in a 100x increase
> in the amount of data returned from the database. This data will be
> correctly filtered by the UDF, but returning 100x the data makes it a lot
> slower. So I was trying to push it down to the source server instead.
> > >
> > > What do you mean by, "I guess Calcite might probably won't be able to
> do much with these ORs anyway..."? From my experiments I've seen two
> results from passing in this many OR's:
> > >
> > > - If no other predicates are included in the query, then Calcite
> succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> > > - If additional predicates are included, then Calcite nests the OR
> statements, leading to a stackoverflow for very large OR's, which is
> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
> > >
> > > Thanks,
> > > Peter
> > >
> > > -----Original Message-----
> > > From: Zoltan Haindrich <ki...@rxd.hu>
> > > Sent: Friday, October 4, 2019 12:38 AM
> > > To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>;
> Peter Wicks (pwicks) <pw...@micron.com>
> > > Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN"
> Conversion Logic
> > >
> > >
> > > I think you might try another approach: introduce some UDF and use
> your translation logic to call that - as the UDF will be opaque for calcite
> it will be left alone.
> > > I guess Calcite might probably won't be able to do much with these ORs
> anyway...
> > >
> > >
> > > On 10/3/19 11:26 PM, Haisheng Yuan wrote:
> > >> I don't think this can be done in SqlRexConvertlet, which converts
> SqlNode to RexNode.
> > >> You might need to massage the SqlToRelConverter to create the RelNode
> that you want.
> > >>
> > >> BTW, I still think we need RexNode for IN/ANY.
> > > I also feel that there is some need for IN nodes; but there are some
> good sides of not having it as well: like simplification handles them
> better.
> > >
> > >
> > >>
> > >> - Haisheng
> > >>
> > >> ------------------------------------------------------------------
> > >> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> > >> 日 期:2019年10月04日 04:03:51
> > >> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> > >> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >>
> > >> Haisheng,
> > >>
> > >> Yes, that is what I would like to do. Unfortunately, I’m not sure how
> to proceed to actually do that. I was hoping for a pointer to an example
> that is similar?
> > >>
> > >> Thanks!
> > >> Peter
> > >>
> > >> From: Haisheng Yuan <h....@alibaba-inc.com>
> > >> Sent: Thursday, October 3, 2019 1:35 PM
> > >> To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
> > >> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> Logic
> > >>
> > >> Currently Calcite doesn't have IN RexNode, only has IN SqlNode,
> unfortunately.
> > >>
> > >> You can create a Values node with these authorization data, and make
> a semi join with the table and Values you created.
> > >>
> > >> - Haisheng
> > >>
> > >> ------------------------------------------------------------------
> > >> 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
> > >> 日 期:2019年10月04日 02:34:02
> > >> 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:
> dev@calcite.apache.org%3cdev@calcite.apache.org>>
> > >> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
> > >>
> > >> A little detail about what I'm trying to do:
> > >>
> > >> I have an external API that contains authorization information on a
> per user basis. I want users to be able to include an operation in their
> query that will filter data based on this authorization data.
> > >>
> > >> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that
> implements SqlRexConvertlet, and I am able to get this working. The user
> includes in their predicate statement `custom_authorize(column)`, my
> convertlet queries the API, gets the authorization rules, builds an OR
> statement, and the results come back. This works sometimes, but other times
> the OR condition becomes too large, and I run into CALCITE-2792:
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
> which causes a stackoverflow and my query dies.
> > >>
> > >> So I tried converting to an IN statement, having read that IN
> statements are automatically converted to a sub query join when the default
> limit of 20 is exceeded. The problem is that this appears only to be true
> for IN statements that are included in the initial query. IN statements
> created as the result of a convertlet do not get modified, and are sent as
> an IN statement, which results in a failure to parse the query. I looked at
> how Calcite normally does this translation from IN to exists using a join,
> but it depends on a lot of classes/instances that aren't available in the
> SqlRexContext space. Is it possible to rewrite my IN statement to a
> join/exists query like Calcite normally does?
> > >>
> > >> Also, am I doing things all wrong? Is there a better way to go about
> this?
> > >>
> > >> Code Sample below is for the OR version, the commented code can be
> swapped in to see how I was building the IN statement.
> > >>
> > >> @Override
> > >> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
> > >> HashSet<String> keyList = null;
> > >> try {
> > >> keyList = new
> Manager().getAllowedIDs(getContextInformation().getQueryUser());
> > >> } catch (SQLException e) {
> > >> e.printStackTrace();
> > >> }
> > >>
> > >> final RexBuilder rexBuilder = cx.getRexBuilder();
> > >> final RexNode column = cx.convertExpression(call.operand(0));
> > >>
> > >> final List<RexNode> nodes = new ArrayList<>();
> > >> for(String s: keyList) {
> > >> nodes.add(rexBuilder.makeCall(EQUALS, column,
> rexBuilder.makeLiteral(s)));
> > >> //nodes.add(rexBuilder.makeLiteral(s));
> > >> }
> > >>
> > >> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
> > >> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new
> RexNode[0]));
> > >>
> > >> return in;
> > >> }
> > >>
> > >> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node,
> RexNode... nodes) {
> > >> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
> > >> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
> > >> }
> > >>
> > >>
> > >> Thanks,
> > >> Peter
> > >>
> > >
> > >
> >
> >
>

Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

Posted by Julian Hyde <jh...@apache.org>.
A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
$HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
$HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
IN (v1, v2)".

At any point in this lifecycle, you could intercept and and simplify.

On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h....@alibaba-inc.com> wrote:
>
> Will the filter condition with “$HARD_IN” internal function be able to pushed down and be recognized by the source SQL system, like Peter mentioned?
>
> If not, we have to translate the internal function back to IN during Rel2Sql phase. Otherwise, the data read from the source table can be much larger.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人:Julian Hyde<jh...@apache.org>
> 日 期:2019年10月08日 04:53:11
> 收件人:dev<de...@calcite.apache.org>
> 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
>
> In https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209 <https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209> I floated the idea of a “$HARD_IN” internal function that has the same semantics as IN but is not expanded to ‘… = OR … = …’.
>
> I think it would be a useful tool, if used judiciously.
>
> Julian
>
>
> > On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com> wrote:
> >
> > As a workaround, you can modify you SqlRexConverlet, create a RexCall with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> > Because every OR RexCall has exactly 2 operands, it won't transform into SqlCall with left deep tree.
> >
> > Let me know it works for you or not.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人:Haisheng Yuan<h....@alibaba-inc.com>
> > 日 期:2019年10月05日 07:37:04
> > 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<de...@calcite.apache.org>
> > 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> >
> > If you want to push the filter down to the source SQL sytem, then transforming to a join won't help you either.
> >
> > The reason of stackoverflow for large ORs is the left deep binary tree, we need to change it to balanced binary tree, to reduce the depth of the call.
> >
> > I will open a pull request later.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> > 日 期:2019年10月04日 21:32:25
> > 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> >
> > Zoltan,
> >
> > Thanks for the suggestion. I actually tried doing a UDF first, and it was also successful, sorry for not sharing those details earlier.
> > The problem with the UDF is that the predicates are not pushed down to the source SQL system (by design), and this can result in a 100x increase in the amount of data returned from the database. This data will be correctly filtered by the UDF, but returning 100x the data makes it a lot slower. So I was trying to push it down to the source server instead.
> >
> > What do you mean by, "I guess Calcite might probably won't be able to do much with these ORs anyway..."? From my experiments I've seen two results from passing in this many OR's:
> >
> > - If no other predicates are included in the query, then Calcite succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> > - If additional predicates are included, then Calcite nests the OR statements, leading to a stackoverflow for very large OR's, which is CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
> >
> > Thanks,
> > Peter
> >
> > -----Original Message-----
> > From: Zoltan Haindrich <ki...@rxd.hu>
> > Sent: Friday, October 4, 2019 12:38 AM
> > To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>; Peter Wicks (pwicks) <pw...@micron.com>
> > Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> >
> >
> > I think you might try another approach: introduce some UDF and use your translation logic to call that - as the UDF will be opaque for calcite it will be left alone.
> > I guess Calcite might probably won't be able to do much with these ORs anyway...
> >
> >
> > On 10/3/19 11:26 PM, Haisheng Yuan wrote:
> >> I don't think this can be done in SqlRexConvertlet, which converts SqlNode to RexNode.
> >> You might need to massage the SqlToRelConverter to create the RelNode that you want.
> >>
> >> BTW, I still think we need RexNode for IN/ANY.
> > I also feel that there is some need for IN nodes; but there are some good sides of not having it as well: like simplification handles them better.
> >
> >
> >>
> >> - Haisheng
> >>
> >> ------------------------------------------------------------------
> >> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> >> 日 期:2019年10月04日 04:03:51
> >> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> >> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> >>
> >> Haisheng,
> >>
> >> Yes, that is what I would like to do. Unfortunately, I’m not sure how to proceed to actually do that. I was hoping for a pointer to an example that is similar?
> >>
> >> Thanks!
> >> Peter
> >>
> >> From: Haisheng Yuan <h....@alibaba-inc.com>
> >> Sent: Thursday, October 3, 2019 1:35 PM
> >> To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
> >> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> >>
> >> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, unfortunately.
> >>
> >> You can create a Values node with these authorization data, and make a semi join with the table and Values you created.
> >>
> >> - Haisheng
> >>
> >> ------------------------------------------------------------------
> >> 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
> >> 日 期:2019年10月04日 02:34:02
> >> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>>
> >> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
> >>
> >> A little detail about what I'm trying to do:
> >>
> >> I have an external API that contains authorization information on a per user basis. I want users to be able to include an operation in their query that will filter data based on this authorization data.
> >>
> >> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that implements SqlRexConvertlet, and I am able to get this working. The user includes in their predicate statement `custom_authorize(column)`, my convertlet queries the API, gets the authorization rules, builds an OR statement, and the results come back. This works sometimes, but other times the OR condition becomes too large, and I run into CALCITE-2792: https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0, which causes a stackoverflow and my query dies.
> >>
> >> So I tried converting to an IN statement, having read that IN statements are automatically converted to a sub query join when the default limit of 20 is exceeded. The problem is that this appears only to be true for IN statements that are included in the initial query. IN statements created as the result of a convertlet do not get modified, and are sent as an IN statement, which results in a failure to parse the query. I looked at how Calcite normally does this translation from IN to exists using a join, but it depends on a lot of classes/instances that aren't available in the SqlRexContext space. Is it possible to rewrite my IN statement to a join/exists query like Calcite normally does?
> >>
> >> Also, am I doing things all wrong? Is there a better way to go about this?
> >>
> >> Code Sample below is for the OR version, the commented code can be swapped in to see how I was building the IN statement.
> >>
> >> @Override
> >> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
> >> HashSet<String> keyList = null;
> >> try {
> >> keyList = new Manager().getAllowedIDs(getContextInformation().getQueryUser());
> >> } catch (SQLException e) {
> >> e.printStackTrace();
> >> }
> >>
> >> final RexBuilder rexBuilder = cx.getRexBuilder();
> >> final RexNode column = cx.convertExpression(call.operand(0));
> >>
> >> final List<RexNode> nodes = new ArrayList<>();
> >> for(String s: keyList) {
> >> nodes.add(rexBuilder.makeCall(EQUALS, column, rexBuilder.makeLiteral(s)));
> >> //nodes.add(rexBuilder.makeLiteral(s));
> >> }
> >>
> >> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
> >> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new RexNode[0]));
> >>
> >> return in;
> >> }
> >>
> >> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, RexNode... nodes) {
> >> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
> >> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
> >> }
> >>
> >>
> >> Thanks,
> >> Peter
> >>
> >
> >
>
>

Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

Posted by Haisheng Yuan <h....@alibaba-inc.com>.
Will the filter condition with “$HARD_IN” internal function be able to pushed down and be recognized by the source SQL system, like Peter mentioned?

If not, we have to translate the internal function back to IN during Rel2Sql phase. Otherwise, the data read from the source table can be much larger.

- Haisheng

------------------------------------------------------------------
发件人:Julian Hyde<jh...@apache.org>
日 期:2019年10月08日 04:53:11
收件人:dev<de...@calcite.apache.org>
主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

In https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209 <https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209> I floated the idea of a “$HARD_IN” internal function that has the same semantics as IN but is not expanded to ‘… = OR … = …’.

I think it would be a useful tool, if used judiciously. 

Julian


> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h....@alibaba-inc.com> wrote:
> 
> As a workaround, you can modify you SqlRexConverlet, create a RexCall with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> Because every OR RexCall has exactly 2 operands, it won't transform into SqlCall with left deep tree.
> 
> Let me know it works for you or not.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Haisheng Yuan<h....@alibaba-inc.com>
> 日 期:2019年10月05日 07:37:04
> 收件人:Peter Wicks (pwicks)<pw...@micron.com>; dev@calcite.apache.org<de...@calcite.apache.org>
> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> If you want to push the filter down to the source SQL sytem, then transforming to a join won't help you either.
> 
> The reason of stackoverflow for large ORs is the left deep binary tree, we need to change it to balanced binary tree, to reduce the depth of the call.
> 
> I will open a pull request later.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
> 日 期:2019年10月04日 21:32:25
> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> Zoltan,
> 
> Thanks for the suggestion. I actually tried doing a UDF first, and it was also successful, sorry for not sharing those details earlier.
> The problem with the UDF is that the predicates are not pushed down to the source SQL system (by design), and this can result in a 100x increase in the amount of data returned from the database. This data will be correctly filtered by the UDF, but returning 100x the data makes it a lot slower. So I was trying to push it down to the source server instead.
> 
> What do you mean by, "I guess Calcite might probably won't be able to do much with these ORs anyway..."? From my experiments I've seen two results from passing in this many OR's:
> 
> - If no other predicates are included in the query, then Calcite succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> - If additional predicates are included, then Calcite nests the OR statements, leading to a stackoverflow for very large OR's, which is CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4) 
> 
> Thanks,
> Peter
> 
> -----Original Message-----
> From: Zoltan Haindrich <ki...@rxd.hu> 
> Sent: Friday, October 4, 2019 12:38 AM
> To: dev@calcite.apache.org; Haisheng Yuan <h....@alibaba-inc.com>; Peter Wicks (pwicks) <pw...@micron.com>
> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> 
> I think you might try another approach: introduce some UDF and use your translation logic to call that - as the UDF will be opaque for calcite it will be left alone.
> I guess Calcite might probably won't be able to do much with these ORs anyway...
> 
> 
> On 10/3/19 11:26 PM, Haisheng Yuan wrote:
>> I don't think this can be done in SqlRexConvertlet, which converts SqlNode to RexNode.
>> You might need to massage the SqlToRelConverter to create the RelNode that you want.
>> 
>> BTW, I still think we need RexNode for IN/ANY.
> I also feel that there is some need for IN nodes; but there are some good sides of not having it as well: like simplification handles them better.
> 
> 
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>
>> 日 期:2019年10月04日 04:03:51
>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>
>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Haisheng,
>> 
>> Yes, that is what I would like to do. Unfortunately, I’m not sure how to proceed to actually do that. I was hoping for a pointer to an example that is similar?
>> 
>> Thanks!
>> Peter
>> 
>> From: Haisheng Yuan <h....@alibaba-inc.com>
>> Sent: Thursday, October 3, 2019 1:35 PM
>> To: Peter Wicks (pwicks) <pw...@micron.com>; dev@calcite.apache.org
>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, unfortunately.
>> 
>> You can create a Values node with these authorization data, and make a semi join with the table and Values you created.
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<pw...@micron.com>>
>> 日 期:2019年10月04日 02:34:02
>> 收件人:dev@calcite.apache.org<de...@calcite.apache.org>>
>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> A little detail about what I'm trying to do:
>> 
>> I have an external API that contains authorization information on a per user basis. I want users to be able to include an operation in their query that will filter data based on this authorization data.
>> 
>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that implements SqlRexConvertlet, and I am able to get this working. The user includes in their predicate statement `custom_authorize(column)`, my convertlet queries the API, gets the authorization rules, builds an OR statement, and the results come back. This works sometimes, but other times the OR condition becomes too large, and I run into CALCITE-2792: https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0, which causes a stackoverflow and my query dies.
>> 
>> So I tried converting to an IN statement, having read that IN statements are automatically converted to a sub query join when the default limit of 20 is exceeded. The problem is that this appears only to be true for IN statements that are included in the initial query. IN statements created as the result of a convertlet do not get modified, and are sent as an IN statement, which results in a failure to parse the query. I looked at how Calcite normally does this translation from IN to exists using a join, but it depends on a lot of classes/instances that aren't available in the SqlRexContext space. Is it possible to rewrite my IN statement to a join/exists query like Calcite normally does?
>> 
>> Also, am I doing things all wrong? Is there a better way to go about this?
>> 
>> Code Sample below is for the OR version, the commented code can be swapped in to see how I was building the IN statement.
>> 
>> @Override
>> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
>> HashSet<String> keyList = null;
>> try {
>> keyList = new Manager().getAllowedIDs(getContextInformation().getQueryUser());
>> } catch (SQLException e) {
>> e.printStackTrace();
>> }
>> 
>> final RexBuilder rexBuilder = cx.getRexBuilder();
>> final RexNode column = cx.convertExpression(call.operand(0));
>> 
>> final List<RexNode> nodes = new ArrayList<>();
>> for(String s: keyList) {
>> nodes.add(rexBuilder.makeCall(EQUALS, column, rexBuilder.makeLiteral(s)));
>> //nodes.add(rexBuilder.makeLiteral(s));
>> }
>> 
>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new RexNode[0]));
>> 
>> return in;
>> }
>> 
>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, RexNode... nodes) {
>> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
>> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
>> }
>> 
>> 
>> Thanks,
>> Peter
>> 
> 
>