You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao (JIRA)" <ji...@apache.org> on 2014/11/13 20:01:33 UTC
[jira] [Created] (HIVE-8859) ColumnStatsTask fails because of
SparkMapJoinResolver
Chao created HIVE-8859:
--------------------------
Summary: ColumnStatsTask fails because of SparkMapJoinResolver
Key: HIVE-8859
URL: https://issues.apache.org/jira/browse/HIVE-8859
Project: Hive
Issue Type: Sub-task
Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
The following query fails:
{code}
ANALYZE TABLE src COMPUTE STATISTICS FOR COLUMNS key,value;
{code}
The plan looks like:
{noformat}
STAGE DEPENDENCIES:
Stage-0 is a root stage
Stage-2 is a root stage
STAGE PLANS:
Stage: Stage-0
Spark
Edges:
Reducer 2 <- Map 1 (GROUP, 1)
DagName: chao_20141113105959_486b4bba-a2da-43c5-bf42-0ee69cd42576:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string), value (type: string)
outputColumnNames: key, value
Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: compute_stats(key, 16), compute_stats(value, 16)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: _col0 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>), _col1 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>)
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: compute_stats(VALUE._col0), compute_stats(VALUE._col1)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: struct<columntype:string,maxlength:bigint,avglength:double,countnulls:bigint,numdistinctvalues:bigint>), _col1 (type: struct<columntype:string,maxlength:bigint,avglength:double,countnulls:bigint,numdistinctvalues:bigint>)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-2
Column Stats Work
Column Stats Desc:
Columns: key, value
Column Types: string, string
Table: src
{noformat}
This query will fail because {{SparkMapJoinResolver#createSparkTask}} swaps the order of two tasks in the root task list. But, this is rather interesting, since if they are both root tasks, then order shouldn't matter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)