You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "luoyuxia (Jira)" <ji...@apache.org> on 2022/12/08 08:34:00 UTC

[jira] [Commented] (FLINK-28158) Flink supports all modes of Hive UDAF (PARTIAL1, PARTIAL2, FINAL, COMPLETE)

    [ https://issues.apache.org/jira/browse/FLINK-28158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644664#comment-17644664 ] 

luoyuxia commented on FLINK-28158:
----------------------------------

percent_rank is supported in FLINK-27620. I think it maynot a valid issue, I'll close it. Feel free to open it if you have other thoughts.

> Flink supports all modes of Hive UDAF (PARTIAL1, PARTIAL2, FINAL, COMPLETE)
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-28158
>                 URL: https://issues.apache.org/jira/browse/FLINK-28158
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Hive
>    Affects Versions: 1.15.0
>            Reporter: tartarus
>            Assignee: luoyuxia
>            Priority: Major
>
> Currently Flink UDAF only supports Hive UDAF's PARTIAL_1 and FINAL mode.
> When Flink uses Hive's UDAF percent_rank, it fails with the following exception message
> {code:java}
> org.apache.flink.table.api.TableException: Unexpected error in type inference logic of function 'percent_rank'. This is a bug.    at org.apache.flink.table.types.inference.TypeInferenceUtil.createUnexpectedException(TypeInferenceUtil.java:206)
>     at org.apache.flink.table.planner.functions.inference.TypeInferenceReturnInference.inferReturnType(TypeInferenceReturnInference.java:80)
>     at org.apache.calcite.sql.SqlOperator.inferReturnType(SqlOperator.java:482)
>     at org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:283)
>     at org.apache.calcite.rex.RexBuilder.makeCall(RexBuilder.java:257)
>     at org.apache.flink.table.planner.delegation.hive.SqlFunctionConverter.visitOver(SqlFunctionConverter.java:121)
>     at org.apache.flink.table.planner.delegation.hive.SqlFunctionConverter.visitOver(SqlFunctionConverter.java:56)
>     at org.apache.calcite.rex.RexOver.accept(RexOver.java:121)
>     at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.getWindowRexAndType(HiveParserCalcitePlanner.java:1859)
>     at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genSelectForWindowing(HiveParserCalcitePlanner.java:1913)
>     at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genSelectLogicalPlan(HiveParserCalcitePlanner.java:2002)
>     at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:2751)
>     at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.logicalPlan(HiveParserCalcitePlanner.java:284)
>     at org.apache.flink.table.planner.delegation.hive.HiveParserCalcitePlanner.genLogicalPlan(HiveParserCalcitePlanner.java:272)
>     at org.apache.flink.table.planner.delegation.hive.HiveParser.analyzeSql(HiveParser.java:303)
>     at org.apache.flink.table.planner.delegation.hive.HiveParser.processCmd(HiveParser.java:251)
>     at org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:211)
>     at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:695)
>     at org.apache.flink.connectors.hive.HiveDialectITCase.testPercent_rank(HiveDialectITCase.java:800)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>     at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>     at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
>     at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
> Caused by: org.apache.flink.table.functions.hive.FlinkHiveUDFException: Failed to get Hive result type from org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank
>     at org.apache.flink.table.functions.hive.HiveGenericUDAF.inferReturnType(HiveGenericUDAF.java:249)
>     at org.apache.flink.table.functions.hive.HiveFunction$HiveFunctionOutputStrategy.inferType(HiveFunction.java:122)
>     at org.apache.flink.table.types.inference.TypeInferenceUtil.inferOutputType(TypeInferenceUtil.java:151)
>     at org.apache.flink.table.planner.functions.inference.TypeInferenceReturnInference.inferReturnTypeOrError(TypeInferenceReturnInference.java:99)
>     at org.apache.flink.table.planner.functions.inference.TypeInferenceReturnInference.inferReturnType(TypeInferenceReturnInference.java:76)
>     ... 44 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for Rank function
>     at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$GenericUDAFAbstractRankEvaluator.init(GenericUDAFRank.java:124)
>     at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank$GenericUDAFPercentRankEvaluator.init(GenericUDAFPercentRank.java:59)
>     at org.apache.flink.table.functions.hive.HiveGenericUDAF.init(HiveGenericUDAF.java:99)
>     at org.apache.flink.table.functions.hive.HiveGenericUDAF.inferReturnType(HiveGenericUDAF.java:243)
>     ... 48 more {code}
> According to the exception message, we can see that it is because the percentage_rank function requires COMPLETE Mode.
> We can reproduce it with ITCase:
> {code:java}
> @Test
> public void testPercent_rank() throws Exception {
>     // automatically load hive module in hive-compatible mode
>     HiveModule hiveModule = new HiveModule(hiveCatalog.getHiveVersion());
>     CoreModule coreModule = CoreModule.INSTANCE;
>     for (String loaded : tableEnv.listModules()) {
>         tableEnv.unloadModule(loaded);
>     }
>     tableEnv.loadModule("hive", hiveModule);
>     tableEnv.loadModule("core", coreModule);
>     // Flink UDAF only supports Hive UDAF's PARTIAL_1 and FINAL mode.
>     tableEnv.executeSql(
>             "create temporary function percent_rank as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentRank'");
>     tableEnv.executeSql(
>             "create table cbo_t1(key string, value string, c_int int, c_float float, c_boolean boolean)");
>     List<Row> results =
>             CollectionUtil.iteratorToList(
>                     tableEnv.executeSql(
>                                     "select percent_rank() over(partition by c_float order by key) from cbo_t1")
>                             .collect());
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)