You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Robbie Zhang (Jira)" <ji...@apache.org> on 2021/03/02 12:52:00 UTC

[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

    [ https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293674#comment-17293674 ] 

Robbie Zhang commented on HIVE-24839:
-------------------------------------

We can see such backtrace in HS2 log file:
{code:java}
java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.getRangeWidth(UDFSubstr.java:177)
        at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(UDFSubstr.java:156)
        at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1576)
        at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1435)
        at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:197)
        at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
        at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
        at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
        at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
        at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:447)
        at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185)
        at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:158)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12823)
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:422)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221)
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:598)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:544)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:538)
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
        at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
        at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
        at org.apache.hive.service.cli.operation.Operation.run(Operation.java:274)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551)
        at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}
 The expression "substr(t0.s, t1.i-1)" has a nested function. The second parameter of substr is actually GenericUDFOPMinus. The ColStatistics on it doesn't have a valid range. But getRangeWidth doesn't check it:
{code:java}
    private Optional<Double> getRangeWidth(Range range) {
      if (range.minValue != null && range.maxValue != null) {
        return Optional.of(range.maxValue.doubleValue() - range.minValue.doubleValue());
      }
      return Optional.empty();
    }
{code}
Only 4 UDF classes implement StatEstimatorProvider and only UDFSubstr has this bug.

> SubStrStatEstimator.estimate throws NullPointerException
> --------------------------------------------------------
>
>                 Key: HIVE-24839
>                 URL: https://issues.apache.org/jira/browse/HIVE-24839
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Robbie Zhang
>            Assignee: Robbie Zhang
>            Priority: Major
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null (state=42000,code=40000)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)