You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2017/11/01 18:19:47 UTC

Re: Advice to enhance the RexBuildler's addAggCall method

> Though it seems Drill's type inference effect caused this error. But as a
> common util method , it's better to refactor the implementation to keep the
> interface promise ,not dependent on outside's rule.

You’re expounding the Robustness Principle [1], "Be conservative in what you send, be liberal in what you accept“ but in this case I don’t agree. Because if your code creates a RexNode or AggregateCall that breaks the general constraints on expressions, then code that someone else wrote may be broken by it. You have found only one piece of code, but there may be many more.

I think the solution is to keep the constraints that we have, but add more validation in Calcite. With this validation Drill would have discovered the problem earlier. Sounds like it should have generated sum0(cast($1 as bigint)) and this would have solved the problem.

Julian

[1] https://en.wikipedia.org/wiki/Robustness_principle <https://en.wikipedia.org/wiki/Robustness_principle> 


> On Oct 30, 2017, at 11:00 PM, weijie tong <to...@gmail.com> wrote:
> 
> HI Julian:
> thanks for your response, since I send the last mail from my phone ,I can
> not describe the problem very well. sorry for that.
> 
> error sql:
> 
> select stddev_samp(cast(employee_id as int)) as col
> ,sum(cast(employee_id as int)) as col2 from cp.`employee.json`
> 
> 
> error stack:
> 
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> AssertionError: Type mismatch:
> rel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT NULL
> equivRel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL
> 
> 
> [Error Id: 043be52f-dff2-4839-aa8b-fc1bf2c49698 on localhost:31010]
> 	at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
> ~[classes/:na]
> 	at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:788)
> [classes/:na]
> 
> 
> The generated sub functions and its data type varied through the aggregate
> reduce phase :
> 
> initial  :                                 stddev_samp($0) double
>                             ,            sum($0) bigint
>                                                           |
>                                                                  |
>                                                         \ | /
>                                                                 \|/
> first rule match :        sum($1) integer ,sum($0) integer , count($0)
> bigint          ,             sum0($0)  bigint
>                                                            |
>                                                                    |
>                                                           \|/
>                                                                  \|/
> second rule mach:   sum0($1) integer, sum0($0) integer, count($0) bigint
>    ,              sum0($0)  bigint
> 
> As you can see from the last graph, there's two same function sum0($0) ,but
> different data types, The RexBuildler.addAggCall method now does consider
> the AggregateCall's data type with the existing ones in the AggregateCall
> --> RexNode map. So drill will mistake the wrong RexInputRef.
> 
> There's no doubt that RexNode is strict regarding types. I have also
> examined Calcite's AggregateReduceFunctionsRule.It seems  Calcite will not
> occur this error through my experiment. The reason is Drill has type
> inference effect while Calcite does not. The initial input field $0 is
> integer , to Drill ,the sum(cast (employee_id as int)) is bigint , the
> reduced sum0($0) is bigint through the inference which takes the sum($0)'s
> data type as its data type , to Calcite , the corresponding sum0($0) will
> be integer as its data type is taken from the Aggregate's input field's
> data type which is integer.
> 
> Though it seems Drill's type inference effect caused this error. But as a
> common util method , it's better to refactor the implementation to keep the
> interface promise ,not dependent on outside's rule.
> 
> 
> The refactor content maybe like below:
> 
> public RexNode addAggCall(AggregateCall aggCall, int groupCount,
>    boolean indicator, List<AggregateCall> aggCalls,
>    Map<AggregateCall, RexNode> aggCallMapping,
>    final List<RelDataType> aggArgTypes) {
>  if (aggCall.getAggregation() instanceof SqlCountAggFunction
>      && !aggCall.isDistinct()) {
>    final List<Integer> args = aggCall.getArgList();
>    final List<Integer> nullableArgs = nullableArgs(args, aggArgTypes);
>    if (!nullableArgs.equals(args)) {
>      aggCall = aggCall.copy(nullableArgs, aggCall.filterArg);
>    }
>  }
> 
>  for (AggregateCall aggregateCall : aggCallMapping.keySet()) {
>    if (aggregateCall.equals(aggCall) &&
> !aggregateCall.getType().equals(aggCall.getType())) {
>      int index = aggCalls.size() + groupCount * (indicator ? 2 : 1);
>      aggCalls.add(aggCall);
>      RexNode rex = makeInputRef(aggCall.getType(), index);
>      return rex;
>    }
>  }
> 
>  RexNode rex = aggCallMapping.get(aggCall);
>  if (rex == null) {
>    int index = aggCalls.size() + groupCount * (indicator ? 2 : 1);
>    aggCalls.add(aggCall);
>    rex = makeInputRef(aggCall.getType(), index);
>    aggCallMapping.put(aggCall, rex);
>  }
>  return rex;
> }
> 
> 
> 
> 
> 
> 
> On Tue, Oct 31, 2017 at 1:06 AM, Julian Hyde <jh...@apache.org> wrote:
> 
>> Maybe I don’t fully understand the problem, but a RexInputRef must always
>> have the same type as the field it references. If field 1 is an INTEGER,
>> then RefInputRef(1) must be an INTEGER. If one consumer wants that field as
>> a BIGINT then they must apply a cast (RexCall to the CAST operator).
>> 
>> If you are creating a RexInputRef with the type that you want it to be,
>> then bad things will happen.
>> 
>> Sorry if this is confusing/unexpected. RexNode is very strict regarding
>> types.
>> 
>> Julian
>> 
>> 
>>> On Oct 30, 2017, at 8:41 AM, weijie tong <to...@gmail.com>
>> wrote:
>>> 
>>> Hi devs:
>>> At drill,stddev_pop and sum functions once issued together will occur
>>> error.The reason is the DrillReduceAggregatesRule leverages Calcite's
>>> RexBuilder.addAggCall method to reduce the duplicated functions.But this
>>> method only cares about function name and input references ignoring the
>>> parameter data type.
>>> More details please see https://github.com/apache/drill/pull/1016
>>> 
>>> But I think it's better  to move my judgement logic codes to the
>>> RexBuilder's addAggCall method.
>>> So what do you think about it ? If it sounds ok ,I will be pleasure to do
>>> that.
>> 
>>