You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Sergey Nuyanzin (JIRA)" <ji...@apache.org> on 2018/07/10 16:01:00 UTC

[jira] [Comment Edited] (CALCITE-2402) COVAR_POP/COVAR_SAMP/REGR_SXX/REGR_SYY do not work

    [ https://issues.apache.org/jira/browse/CALCITE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538821#comment-16538821 ] 

Sergey Nuyanzin edited comment on CALCITE-2402 at 7/10/18 4:00 PM:
-------------------------------------------------------------------

Ok I see thank you

while working on it I faced with some issues/questions, could you please comment them?
# Are there any concerns to have SqlNodes (only in private) for expanding variance while in public there is also RexNode? Some more details related to the question. All not biased fail (denominator  {{count - 1}}) in boundary cases for window aggregated, e.g. {code}0: jdbc:calcite:model=target/test-classes/mod> select var_samp(age) over(partition by age) from emps;
+------------+
|   EXPR$0   |
+------------+
java.lang.ArithmeticException: / by zero
	at Baz$4$1.current(Unknown Source)
{code} It happens only while window aggregation because in regular aggregate there is CASE statement for that, i.e.
While trying to do the similar in case of window for the code {code}final SqlNumericLiteral one = SqlLiteral.createExactNumeric("1", pos);
        final SqlLiteral nullLiteral = SqlLiteral.createNull(SqlParserPos.ZERO);
        denominator = new SqlCase(SqlParserPos.ZERO,
            count,
            SqlNodeList.of(
                SqlStdOperatorTable.EQUALS.createCall(pos,
                    count, SqlLiteral.createExactNumeric("1", SqlParserPos.ZERO))
            ),
            SqlNodeList.of(
                getCastedSqlNode(nullLiteral, varType, pos, null)
            ),
            SqlStdOperatorTable.MINUS.createCall(pos, count, one));{code} it fails like {noformat}Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlLiteral: NULL
        at org.apache.calcite.util.Util.needToImplement(Util.java:921)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1551)
        at org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertLiteral(SqlNodeToRexConverterImpl.java:93)
        at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4659)
        at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3977)
        at org.apache.calcite.sql.SqlLiteral.accept(SqlLiteral.java:532)
        at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4541)
        at org.apache.calcite.sql2rel.StandardConvertletTable.convertCase(StandardConvertletTable.java:379)
        ... 37 more{noformat} At the same time if I use RexBuilder and RexNode instead of SqlNodes for the similar CASE - everything works fine. 
 # For COVAR* and REGR* implementation I need a function to eliminate all rows where any of two arguments is null. Just double check if there is an existing function for that or no? Before I thought that {{SqlStdOperatorTable#FILTER}} could help however it fails that it does not have implementer.
# VARIANCE and VAR_SAMP are implemented as the same (in different rdbms they implemented a little different e.g. [Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/VARIANCE.html#GUID-EC33717A-2509-402D-B3BB-7EECB2E4ED8B], [PostgreSQL|https://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]). Double check if Calcite's goes PostegreSQL way of not?


was (Author: sergey nuyanzin):
Ok I see thank you

while working on it I faced with some issues/questions, could you please comment them?
# Are there any concerns to have SqlNodes (only in private) for expanding variance while in public there is also RexNode? Some more details related to the question. All not biased fail (denominator  {{count - 1}}) in boundary cases for window aggregated, e.g. {code}0: jdbc:calcite:model=target/test-classes/mod> select var_samp(age) over(partition by age) from emps;
+------------+
|   EXPR$0   |
+------------+
java.lang.ArithmeticException: / by zero
	at Baz$4$1.current(Unknown Source)
{code} It happens only while window aggregation because in regular aggregate there is CASE statement for that, i.e.
While trying to do the similar in case of window for the code {code}final SqlNumericLiteral one = SqlLiteral.createExactNumeric("1", pos);
        final SqlLiteral nullLiteral = SqlLiteral.createNull(SqlParserPos.ZERO);
        denominator = new SqlCase(SqlParserPos.ZERO,
            count,
            SqlNodeList.of(
                SqlStdOperatorTable.EQUALS.createCall(pos,
                    count, SqlLiteral.createExactNumeric("1", SqlParserPos.ZERO))
            ),
            SqlNodeList.of(
                getCastedSqlNode(nullLiteral, varType, pos, null)
            ),
            SqlStdOperatorTable.MINUS.createCall(pos, count, one));{code} it fails like {noformat}Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlLiteral: NULL
        at org.apache.calcite.util.Util.needToImplement(Util.java:921)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1551)
        at org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertLiteral(SqlNodeToRexConverterImpl.java:93)
        at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4659)
        at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3977)
        at org.apache.calcite.sql.SqlLiteral.accept(SqlLiteral.java:532)
        at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4541)
        at org.apache.calcite.sql2rel.StandardConvertletTable.convertCase(StandardConvertletTable.java:379)
        ... 37 more{noformat} At the same time if I use RexBuilder and RexNode instead of SqlNodes - everything works fine. 
 # For COVAR* and REGR* implementation I need a function to eliminate all rows where any of two arguments is null. Just double check if there is an existing function for that or no? Before I thought that {{SqlStdOperatorTable#FILTER}} could help however it fails that it does not have implementer.
# VARIANCE and VAR_SAMP are implemented as the same (in different rdbms they implemented a little different e.g. [Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/VARIANCE.html#GUID-EC33717A-2509-402D-B3BB-7EECB2E4ED8B], [PostgreSQL|https://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]). Double check if Calcite's goes PostegreSQL way of not?

> COVAR_POP/COVAR_SAMP/REGR_SXX/REGR_SYY  do not work
> ---------------------------------------------------
>
>                 Key: CALCITE-2402
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2402
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Sergey Nuyanzin
>            Assignee: Julian Hyde
>            Priority: Major
>
> Any query from {code:sql}
> select covar_samp(empno, deptno) from emps;
> select covar_pop(empno, deptno) from emps;
> select regr_sxx(empno, deptno) from emps;
> select regr_syy(empno, deptno) from emps;
> {code} 
> fails (the trace is below) 
> As I understand the reason is not fully implementation (did not find any convertlet for them e.g.). 
> From my point of view I could fix this issue however I have a question: 
> How these function should be handled? As reducible functions (like STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP) or as Rex operators in RexImpTable? 
> Please give some advice here
> {noformat}0: jdbc:calcite:model=target/test-classes/mod> select covar_samp(empno, deptno) from emps;
> Error: Error while executing SQL "select covar_samp(empno, deptno) from emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]), rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2), rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2), rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0, $1)), rowcount=10.0, cumulative cost={inf}
>                 rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0, $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]), rowcount=10.0, cumulative cost={inf} (state=,code=0)
> java.sql.SQLException: Error while executing SQL "select covar_samp(empno, deptno) from emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]), rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2), rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2), rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0, $1)), rowcount=10.0, cumulative cost={inf}
>                 rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0, $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]), rowcount=10.0, cumulative cost={inf}
>         at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>         at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>         at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163)
>         at org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>         at sqlline.Commands.execute(Commands.java:823)
>         at sqlline.Commands.sql(Commands.java:733)
>         at sqlline.SqlLine.dispatch(SqlLine.java:795)
>         at sqlline.SqlLine.begin(SqlLine.java:668)
>         at sqlline.SqlLine.start(SqlLine.java:373)
>         at sqlline.SqlLine.main(SqlLine.java:265)
> Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]), rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2), rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2), rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0, $1)), rowcount=10.0, cumulative cost={inf}
>                 rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0, $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]), rowcount=10.0, cumulative cost={inf}
>         at org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
>         at org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
>         at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
>         at org.apache.calcite.tools.Programs$5.run(Programs.java:326)
>         at org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:387)
>         at org.apache.calcite.prepare.Prepare.optimize(Prepare.java:188)
>         at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:319)
>         at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:230)
>         at org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:783)
>         at org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:642)
>         at org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:612)
>         at org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:232)
>         at org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:609)
>         at org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>         at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>         ... 7 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)