You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2015/06/22 09:37:00 UTC
[jira] [Comment Edited] (HIVE-11069) ColumnStatsTask doesn't work with hive.exec.parallel

    [ https://issues.apache.org/jira/browse/HIVE-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595462#comment-14595462 ] 

Ashutosh Chauhan edited comment on HIVE-11069 at 6/22/15 7:36 AM:
------------------------------------------------------------------

Dupe of HIVE-10677 ?


was (Author: ashutoshc):
Dupe of HIVE-10667 ?

> ColumnStatsTask doesn't work with hive.exec.parallel
> ----------------------------------------------------
>
>                 Key: HIVE-11069
>                 URL: https://issues.apache.org/jira/browse/HIVE-11069
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Rajat Jain
>
> Try a simple query:
> {code}
> hive> set hive.exec.parallel=true;
> hive> analyze table src compute statistics for columns;
> {code}
> It fails with errors similar to:
> {code}
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> hive> java.lang.RuntimeException: Error caching map.xml: java.io.IOException: java.lang.InterruptedException
> 	at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
> 	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
> 	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> 	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
> 	at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> Caused by: java.io.IOException: java.lang.InterruptedException
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1450)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1402)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> 	at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2758)
> 	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2729)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859)
> 	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1954)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:765)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:691)
> 	... 7 more
> Caused by: java.lang.InterruptedException
> 	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:187)
> 	at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1049)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1444)
> 	... 28 more
> Job Submission failed with exception 'java.lang.RuntimeException(Error caching map.xml: java.io.IOException: java.lang.InterruptedException)'
> {code}
> The problem is the Column Stats Task doesn't depend on the root task which causes errors. Here's the explain output:
> {code}
> hive> explain analyze table src compute statistics for columns;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
>   Stage-1 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: src
>             Select Operator
>               expressions: key (type: string), value (type: string)
>               outputColumnNames: key, value
>               Group By Operator
>                 aggregations: compute_stats(key, 16), compute_stats(value, 16)
>                 mode: hash
>                 outputColumnNames: _col0, _col1
>                 Reduce Output Operator
>                   sort order:
>                   value expressions: _col0 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>), _col1 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>)
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations: compute_stats(VALUE._col0), compute_stats(VALUE._col1)
>           mode: mergepartial
>           outputColumnNames: _col0, _col1
>           File Output Operator
>             compressed: false
>             table:
>                 input format: org.apache.hadoop.mapred.TextInputFormat
>                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-1
>     Column Stats Work
>       Column Stats Desc:
>           Columns: key, value
>           Column Types: string, string
>           Table: default.src
> Time taken: 0.761 seconds, Fetched: 39 row(s)
> {code}
> For reference, here's the corresponding output in Hive 0.13:
> {code}
> hive> explain analyze table orders compute statistics for columns;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
>   Stage-1 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-0
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: orders
>             Statistics: Num rows: 0 Data size: 13310103552 Basic stats: PARTIAL Column stats: COMPLETE
>   Stage: Stage-1
>     Stats-Aggr Operator
> Time taken: 2.142 seconds, Fetched: 15 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)