You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by PradeepKumar Yadav <Pr...@protegrity.com> on 2018/05/14 06:44:45 UTC

RE: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Any updates on this. Need to understand this behavior before our release which is scheduled next month.
                Thanks in advance.

From: PradeepKumar Yadav
Sent: Tuesday, April 17, 2018 6:41 PM
To: 'Jason Dere' <jd...@hortonworks.com>; user@hive.apache.org
Subject: RE: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Attaching logs with changes.
                hiveDeterministicFalseLog - log for testUdf with @UDFType( deterministic = false )
                hiveNoUDFTypeAnnotationLog - log for testUdf with no @UDFType Annotation
Thanks

From: Jason Dere [mailto:jdere@hortonworks.com]
Sent: Monday, April 16, 2018 11:18 PM
To: PradeepKumar Yadav <Pr...@protegrity.com>>; user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Hive Custom UDF evaluate behavior when @UDFType is set


I'd suggested logging the stack trace of the call, the logs attached don't really give much information of where the calls are occurring during query compilation/execution.

Try logger.info("************Inside testUdf Initialize***************", new Exception("initialize");





________________________________
From: PradeepKumar Yadav <Pr...@protegrity.com>>
Sent: Monday, April 16, 2018 4:53 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Cc: Jason Dere
Subject: FW: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Regarding the previous mail sent, I have attached following observation documents -
1.       testUdfNoAnnotation.java - contains UDF code with no @UDFType annotation.
2.       hive-default-No-annotation-log.txt - HiveServer2 logs after executing the UDF created through the above class
3.       hive-default-udf-annotation.jpg - The beeline output after creating and executing UDF created through above class
4.       testUdf.java - contains UDF code with no @UDFType( deterministic = false )
5.       hive-deterministic-false-log.txt - JobHistory logs after executing the UDF created through the above class
6.       hive-deterministic-false.jpg - The beeline output after creating and executing UDF created through above class

Thanks,
PradeepKumar Yadav
From: Jason Dere [mailto:jdere@hortonworks.com]
Sent: Wednesday, April 11, 2018 12:02 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Hive Custom UDF evaluate behavior when @UDFType is set


Might have to do with constant propagation because the function was listed as deterministic. You can try logging the stack trace during execution and pasting both stack traces here, may help give more clues as to what is going on.



________________________________
From: PradeepKumar Yadav <Pr...@protegrity.com>>
Sent: Monday, April 9, 2018 11:35 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Recently while creating a custom generic hive UDF I came across a different behavior for the Evaluate method. The custom UDF had a logic to increment the counter and write it to a file. Now when I execute it directly without involving any table it always returns an extra count i.e. 2.
                Now when I added some logs to inside the evaluate method I observed that the logs (sysout) were printed twice. Now on further research I came across the @UDFType annotation and found out that if we do not provide this annotation in our custom UDF, default value is deterministic true.
                When I provide this annotation in my custom UDF and set @UDFType( deterministic = false ), I observed that my logs were printed only once and my UDF was returning the accurate count i.e. 1 therefore implying my evaluate was called only once when @UDFType( deterministic = false ).
                Now I wanted to understand what is the connection between @UDFType and Evaluate method when UDF is invoked directly without a table.

                Note : When I invoke my UDF on a table I get the appropriate count even with @UDFType( deterministic = true ).

                Thanks in advance. :)
Regards,
PradeepKumar Yadav

Re: Hive Custom UDF evaluate behavior when @UDFType is set

Posted by Jason Dere <jd...@hortonworks.com>.
It looks like there are 2 separate places where constant folding is occurring:


java.lang.Exception: Evaluate

        at com.protegrity.hive.udf.testUdf.evaluate(testUdf.java:38)

        at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:145)

        at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:232)

        at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:958)

        at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1168)

        at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)

        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)

        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)

        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)

        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)

        at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:192)

        at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:145)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10530)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10486)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3720)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3499)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9011)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8966)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9812)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9705)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10141)

        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:286)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10152)

        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)

        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)

        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)


And


java.lang.Exception: Evaluate

        at com.protegrity.hive.udf.testUdf.evaluate(testUdf.java:38)

        at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.evaluateFunction(ConstantPropagateProcFactory.java:533)

        at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:238)

        at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.access$000(ConstantPropagateProcFactory.java:92)

        at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory$ConstantPropagateSelectProc.process(ConstantPropagateProcFactory.java:735)

        at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)

        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)

        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)

        at org.apache.hadoop.hive.ql.optimizer.ConstantPropagate$ConstantPropagateWalker.walk(ConstantPropagate.java:147)

        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)

        at org.apache.hadoop.hive.ql.optimizer.ConstantPropagate.transform(ConstantPropagate.java:117)

        at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:182)

        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10207)

        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)

        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)

        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)



​I don't think this is necessarily a bug, just how constants are folded/propagated in Hive.

I'm actually surprised you did not hit this when running the query against tables. Unless the UDF was taking in parameters based on the table's column values (then no constant propagation).



________________________________
From: PradeepKumar Yadav <Pr...@protegrity.com>
Sent: Sunday, May 13, 2018 11:44 PM
To: Jason Dere; user@hive.apache.org
Subject: RE: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Any updates on this. Need to understand this behavior before our release which is scheduled next month.
                Thanks in advance.

From: PradeepKumar Yadav
Sent: Tuesday, April 17, 2018 6:41 PM
To: 'Jason Dere' <jd...@hortonworks.com>; user@hive.apache.org
Subject: RE: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Attaching logs with changes.
                hiveDeterministicFalseLog – log for testUdf with @UDFType( deterministic = false )
                hiveNoUDFTypeAnnotationLog – log for testUdf with no @UDFType Annotation
Thanks

From: Jason Dere [mailto:jdere@hortonworks.com]
Sent: Monday, April 16, 2018 11:18 PM
To: PradeepKumar Yadav <Pr...@protegrity.com>>; user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Hive Custom UDF evaluate behavior when @UDFType is set


I'd suggested logging the stack trace of the call, the logs attached don't really give much information of where the calls are occurring during query compilation/execution.

Try logger.info("************Inside testUdf Initialize***************", new Exception("initialize");





________________________________
From: PradeepKumar Yadav <Pr...@protegrity.com>>
Sent: Monday, April 16, 2018 4:53 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Cc: Jason Dere
Subject: FW: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Regarding the previous mail sent, I have attached following observation documents –
1.       testUdfNoAnnotation.java – contains UDF code with no @UDFType annotation.
2.       hive-default-No-annotation-log.txt – HiveServer2 logs after executing the UDF created through the above class
3.       hive-default-udf-annotation.jpg – The beeline output after creating and executing UDF created through above class
4.       testUdf.java - contains UDF code with no @UDFType( deterministic = false )
5.       hive-deterministic-false-log.txt - JobHistory logs after executing the UDF created through the above class
6.       hive-deterministic-false.jpg - The beeline output after creating and executing UDF created through above class

Thanks,
PradeepKumar Yadav
From: Jason Dere [mailto:jdere@hortonworks.com]
Sent: Wednesday, April 11, 2018 12:02 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Hive Custom UDF evaluate behavior when @UDFType is set


Might have to do with constant propagation because the function was listed as deterministic. You can try logging the stack trace during execution and pasting both stack traces here, may help give more clues as to what is going on.



________________________________
From: PradeepKumar Yadav <Pr...@protegrity.com>>
Sent: Monday, April 9, 2018 11:35 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
                Recently while creating a custom generic hive UDF I came across a different behavior for the Evaluate method. The custom UDF had a logic to increment the counter and write it to a file. Now when I execute it directly without involving any table it always returns an extra count i.e. 2.
                Now when I added some logs to inside the evaluate method I observed that the logs (sysout) were printed twice. Now on further research I came across the @UDFType annotation and found out that if we do not provide this annotation in our custom UDF, default value is deterministic true.
                When I provide this annotation in my custom UDF and set @UDFType( deterministic = false ), I observed that my logs were printed only once and my UDF was returning the accurate count i.e. 1 therefore implying my evaluate was called only once when @UDFType( deterministic = false ).
                Now I wanted to understand what is the connection between @UDFType and Evaluate method when UDF is invoked directly without a table.

                Note : When I invoke my UDF on a table I get the appropriate count even with @UDFType( deterministic = true ).

                Thanks in advance. ☺
Regards,
PradeepKumar Yadav