You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Robbie Zhang (Jira)" <ji...@apache.org> on 2021/03/02 12:52:00 UTC
[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws
NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293674#comment-17293674 ]
Robbie Zhang commented on HIVE-24839:
-------------------------------------
We can see such backtrace in HS2 log file:
{code:java}
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.getRangeWidth(UDFSubstr.java:177)
at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(UDFSubstr.java:156)
at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1576)
at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1435)
at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:197)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:447)
at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:158)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12823)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:422)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:598)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:544)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:538)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:274)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
The expression "substr(t0.s, t1.i-1)" has a nested function. The second parameter of substr is actually GenericUDFOPMinus. The ColStatistics on it doesn't have a valid range. But getRangeWidth doesn't check it:
{code:java}
private Optional<Double> getRangeWidth(Range range) {
if (range.minValue != null && range.maxValue != null) {
return Optional.of(range.maxValue.doubleValue() - range.minValue.doubleValue());
}
return Optional.empty();
}
{code}
Only 4 UDF classes implement StatEstimatorProvider and only UDFSubstr has this bug.
> SubStrStatEstimator.estimate throws NullPointerException
> --------------------------------------------------------
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
> Issue Type: Bug
> Reporter: Robbie Zhang
> Assignee: Robbie Zhang
> Priority: Major
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null (state=42000,code=40000)
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)