You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "qingfa zhou (Jira)" <ji...@apache.org> on 2019/09/10 04:40:00 UTC
[jira] [Updated] (HIVE-22181) Same query but different result
occasionally
[ https://issues.apache.org/jira/browse/HIVE-22181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
qingfa zhou updated HIVE-22181:
-------------------------------
Description:
h3. 1.Phenomenon
Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively
1)
OK
3951864 808
2)
OK
1822979 353
But the first result is true.
h3. 2.SQL
select
count(1),count(distinct store_code)
from
(
select
store_code,
all_pay_id,
payable_price,
row_number()over(partition by store_code order by payable_price desc) as rw,
count(1)over(partition by store_code) as store_user
from
(
select
store_code,
all_pay_id,
sum(payable_price) as payable_price
from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
where dt='20190904'
group by store_code,all_pay_id
) a
order by rw
) a
where rw/store_user<=0.8
;
h3. 3.Troubleshooting
I lookup task result from tez web ui,give the result as follows:
the first:
the second:
!image-2019-09-10-12-18-08-769.png!
so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first.
This is detailed info of the second result from reducer_3 to reducer_4
!image-2019-09-10-12-26-41-681.png!
This is execution plan of the sql:
OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 6
File Output Operator [FS_24]
Group By Operator [GBY_22] (rows=1 width=128)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"]
<-Reducer 5 [SIMPLE_EDGE]
SHUFFLE [RS_21]
Group By Operator [GBY_20] (rows=5619870 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0
Select Operator [SEL_17] (rows=5619870 width=2328)
Output:["_col0"]
<-Reducer 4 [SIMPLE_EDGE]
SHUFFLE [RS_16]
Select Operator [SEL_14] (rows=5619870 width=2328)
Output:["_col0","_col3"]
Filter Operator [FIL_25] (rows=5619870 width=2328)
predicate:((_col0 / count_window_1) <= 0.8)
PTF Operator [PTF_13] (rows=16859610 width=2328)
Function definitions:[{},\\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}]
Select Operator [SEL_12] (rows=16859610 width=2328)
Output:["_col0","_col1"]
<-Reducer 3 [SIMPLE_EDGE]
SHUFFLE [RS_11]
PartitionCols:_col0
Select Operator [SEL_10] (rows=16859610 width=2328)
Output:["_col0","row_number_window_0"]
PTF Operator [PTF_9] (rows=16859610 width=2328)
Function definitions:[{},\\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}]
Select Operator [SEL_8] (rows=16859610 width=2328)
Output:["_col0","_col2"]
<-Reducer 2 [SIMPLE_EDGE]
SHUFFLE [RS_7]
PartitionCols:_col0
Select Operator [SEL_6] (rows=16859610 width=2328)
Output:["_col0","_col2"]
Group By Operator [GBY_5] (rows=16859610 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1
<-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_4]
PartitionCols:_col0, _col1
Group By Operator [GBY_3] (rows=33719220 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id
Select Operator [SEL_2] (rows=33719220 width=2328)
Output:["store_code","all_pay_id","payable_price"]
TableScan [TS_0] (rows=33719220 width=2328)
data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]
Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me.
was:
h3. 1.Phenomenon
Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively
1)
OK
3951864 808
2)
OK
1822979 353
But the first result is true.
h3. 2.SQL
select
count(1),count(distinct store_code)
from
(
select
store_code,
all_pay_id,
payable_price,
row_number()over(partition by store_code order by payable_price desc) as rw,
count(1)over(partition by store_code) as store_user
from
(
select
store_code,
all_pay_id,
sum(payable_price) as payable_price
from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
where dt='20190904'
group by store_code,all_pay_id
) a
order by rw
) a
where rw/store_user<=0.8
;
h3. 3.Troubleshooting
I lookup task result from tez web ui,give the result as follows:
the first: !image-2019-09-10-12-16-45-385.png!
the second:
!image-2019-09-10-12-18-08-769.png!
so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first.
This is detailed info of the second result from reducer_3 to reducer_4
!image-2019-09-10-12-26-41-681.png!
This is execution plan of the sql:
OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 6
File Output Operator [FS_24]
Group By Operator [GBY_22] (rows=1 width=128)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"]
<-Reducer 5 [SIMPLE_EDGE]
SHUFFLE [RS_21]
Group By Operator [GBY_20] (rows=5619870 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0
Select Operator [SEL_17] (rows=5619870 width=2328)
Output:["_col0"]
<-Reducer 4 [SIMPLE_EDGE]
SHUFFLE [RS_16]
Select Operator [SEL_14] (rows=5619870 width=2328)
Output:["_col0","_col3"]
Filter Operator [FIL_25] (rows=5619870 width=2328)
predicate:((_col0 / count_window_1) <= 0.8)
PTF Operator [PTF_13] (rows=16859610 width=2328)
Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}]
Select Operator [SEL_12] (rows=16859610 width=2328)
Output:["_col0","_col1"]
<-Reducer 3 [SIMPLE_EDGE]
SHUFFLE [RS_11]
PartitionCols:_col0
Select Operator [SEL_10] (rows=16859610 width=2328)
Output:["_col0","row_number_window_0"]
PTF Operator [PTF_9] (rows=16859610 width=2328)
Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}]
Select Operator [SEL_8] (rows=16859610 width=2328)
Output:["_col0","_col2"]
<-Reducer 2 [SIMPLE_EDGE]
SHUFFLE [RS_7]
PartitionCols:_col0
Select Operator [SEL_6] (rows=16859610 width=2328)
Output:["_col0","_col2"]
Group By Operator [GBY_5] (rows=16859610 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1
<-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_4]
PartitionCols:_col0, _col1
Group By Operator [GBY_3] (rows=33719220 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id
Select Operator [SEL_2] (rows=33719220 width=2328)
Output:["store_code","all_pay_id","payable_price"]
TableScan [TS_0] (rows=33719220 width=2328)
data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]
Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me.
> Same query but different result occasionally
> --------------------------------------------
>
> Key: HIVE-22181
> URL: https://issues.apache.org/jira/browse/HIVE-22181
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.3.0
> Environment: hadoop:2.7.3.2.6.1.0-129
> hive:2.3.0
> tez:0.8.4
> java:1.8.0_212
> Reporter: qingfa zhou
> Priority: Major
> Attachments: WX20190910-123604.png, WX20190910-123621.png, WX20190910-123641.png
>
>
> h3. 1.Phenomenon
> Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively
> 1)
> OK
> 3951864 808
> 2)
> OK
> 1822979 353
> But the first result is true.
> h3. 2.SQL
> select
> count(1),count(distinct store_code)
> from
> (
> select
> store_code,
> all_pay_id,
> payable_price,
> row_number()over(partition by store_code order by payable_price desc) as rw,
> count(1)over(partition by store_code) as store_user
> from
> (
> select
> store_code,
> all_pay_id,
> sum(payable_price) as payable_price
> from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
> where dt='20190904'
> group by store_code,all_pay_id
> ) a
> order by rw
> ) a
> where rw/store_user<=0.8
> ;
>
> h3. 3.Troubleshooting
> I lookup task result from tez web ui,give the result as follows:
> the first:
> the second:
> !image-2019-09-10-12-18-08-769.png!
> so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first.
> This is detailed info of the second result from reducer_3 to reducer_4
> !image-2019-09-10-12-26-41-681.png!
> This is execution plan of the sql:
> OK
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
> Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
> Stage-0
> Fetch Operator
> limit:-1
> Stage-1
> Reducer 6
> File Output Operator [FS_24]
> Group By Operator [GBY_22] (rows=1 width=128)
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"]
> <-Reducer 5 [SIMPLE_EDGE]
> SHUFFLE [RS_21]
> Group By Operator [GBY_20] (rows=5619870 width=2328)
> Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0
> Select Operator [SEL_17] (rows=5619870 width=2328)
> Output:["_col0"]
> <-Reducer 4 [SIMPLE_EDGE]
> SHUFFLE [RS_16]
> Select Operator [SEL_14] (rows=5619870 width=2328)
> Output:["_col0","_col3"]
> Filter Operator [FIL_25] (rows=5619870 width=2328)
> predicate:((_col0 / count_window_1) <= 0.8)
> PTF Operator [PTF_13] (rows=16859610 width=2328)
> Function definitions:[{},\\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}]
> Select Operator [SEL_12] (rows=16859610 width=2328)
> Output:["_col0","_col1"]
> <-Reducer 3 [SIMPLE_EDGE]
> SHUFFLE [RS_11]
> PartitionCols:_col0
> Select Operator [SEL_10] (rows=16859610 width=2328)
> Output:["_col0","row_number_window_0"]
> PTF Operator [PTF_9] (rows=16859610 width=2328)
> Function definitions:[{},\\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}]
> Select Operator [SEL_8] (rows=16859610 width=2328)
> Output:["_col0","_col2"]
> <-Reducer 2 [SIMPLE_EDGE]
> SHUFFLE [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_6] (rows=16859610 width=2328)
> Output:["_col0","_col2"]
> Group By Operator [GBY_5] (rows=16859610 width=2328)
> Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
> SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=33719220 width=2328)
> Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id
> Select Operator [SEL_2] (rows=33719220 width=2328)
> Output:["store_code","all_pay_id","payable_price"]
> TableScan [TS_0] (rows=33719220 width=2328)
> data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]
>
> Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me.
>
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)