You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2020/05/22 13:56:00 UTC

[jira] [Commented] (HIVE-23532) NPE when fetching incomplete column statistics from the metastore

    [ https://issues.apache.org/jira/browse/HIVE-23532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114061#comment-17114061 ] 

Stamatis Zampetakis commented on HIVE-23532:
--------------------------------------------

The HIVE-23532.01.patch contains only a minimal reproducer. 

Apart from tests the scenario should not appear very often so I am putting this to low priority.

> NPE when fetching incomplete column statistics from the metastore
> -----------------------------------------------------------------
>
>                 Key: HIVE-23532
>                 URL: https://issues.apache.org/jira/browse/HIVE-23532
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Stamatis Zampetakis
>            Priority: Minor
>         Attachments: HIVE-23532.01.patch
>
>
> Certain operations may store in the metastore incomplete column statistics.  Fetching those statistics back from the metastore leads to {{NullPointerException}} .
> For instance consider a column "name" of type string. If we do have statistics for this column then the following info must be available:
> * maxColLen; 
> * avgColLen; 
> * numNulls; 
> * numDVs; 
> Executing the following statement on a table with no stats updates a subset of the statistics for this column:
> {code:sql}
> ALTER TABLE example UPDATE STATISTICS for column name SET ('numDVs'='242', 'numNulls'='5');
> {code}
> Fetching this kind of statistics leads to NPE that sometimes pops up in the client and some other times is buried in the logs leading to incomplete column stats during optimization and execution of a query.
> Usually the stacktrace is similar to the one below:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.metadata.Hive.getTableColumnStatistics(Hive.java:5251)
> 	at org.apache.hadoop.hive.ql.ddl.table.info.desc.DescTableOperation.getColumnDataColPathSpecified(DescTableOperation.java:216)
> 	at org.apache.hadoop.hive.ql.ddl.table.info.desc.DescTableOperation.execute(DescTableOperation.java:94)
> 	at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> 	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> 	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> 	at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:362)
> 	at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:335)
> 	at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> 	at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> 	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:723)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:492)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:486)
> 	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
> 	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
> 	at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:730)
> 	at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:700)
> 	at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
> 	at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> 	at org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
> 	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> 	at org.junit.runners.Suite.runChild(Suite.java:127)
> 	at org.junit.runners.Suite.runChild(Suite.java:26)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> 	at org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:95)
> 	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
> 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
> 	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
> 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:236)
> 	at com.sun.proxy.$Proxy62.getTableColumnStatistics(Unknown Source)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.getTableColumnStatistics(Hive.java:5246)
> 	... 61 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)