You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2017/05/15 20:47:04 UTC
[jira] [Created] (HIVE-16668) Hive on Spark generates incorrect
plan and result with window function and lateral view
Chao Sun created HIVE-16668:
-------------------------------
Summary: Hive on Spark generates incorrect plan and result with window function and lateral view
Key: HIVE-16668
URL: https://issues.apache.org/jira/browse/HIVE-16668
Project: Hive
Issue Type: Bug
Components: Spark
Reporter: Chao Sun
Assignee: Chao Sun
To reproduce:
{code}
create table t1 (a string);
create table t2 (a array<string>);
create table dummy (a string);
insert into table dummy values ("a");
insert into t1 values ("1"), ("2");
insert overwrite into t2 select array("1", "2", "3", "4") from dummy;
set hive.auto.convert.join.noconditionaltask.size=3;
explain
with tt1 as (
select a as id, count(*) over () as count
from t1
),
tt2 as (
select id
from t2
lateral view outer explode(a) a_tbl as id
)
select tt1.count
from tt1 join tt2 on tt1.id = tt2.id;
{code}
For Hive on Spark, the plan is:
{code}
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-1 depends on stages: Stage-2
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Spark
Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 3), Map 1 (PARTITION-LEVEL SORT, 3)
DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:9
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: 0 (type: int)
sort order: +
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
value expressions: a (type: string)
Reducer 2
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Select Operator
expressions: VALUE._col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
PTF Operator
Function definitions:
Input definition
input alias: ptf_0
output shape: _col0: string
type: WINDOWING
Windowing table definition
input alias: ptf_1
name: windowingtablefunction
order by: 0 ASC NULLS FIRST
partition by: 0
raw input shape:
window functions:
window function definition
alias: count_window_0
name: count
window function: GenericUDAFCountEvaluator
window frame: PRECEDING(MAX)~FOLLOWING(MAX)
isStar: true
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), count_window_0 (type: bigint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: string)
1 _col0 (type: string)
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Stage: Stage-1
Spark
DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:8
Vertices:
Map 3
Map Operator Tree:
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
Lateral View Forward
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
Select Operator
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
Lateral View Join Operator
outputColumnNames: _col4
Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col4 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col1
input vertices:
0 Reducer 2
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col1 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Select Operator
expressions: a (type: array<string>)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
UDTF Operator
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
function name: explode
outer lateral view: true
Filter Operator
predicate: col is not null (type: boolean)
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
Lateral View Join Operator
outputColumnNames: _col4
Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col4 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col1
input vertices:
0 Reducer 2
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col1 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
Note that there're two {{Map 1}} s as inputs for {{Reduce 2}}.
The result for this query is:
{code}
4
4
4
4
{code}
for Hive on Spark, which is not correct.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)