You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "qingfa zhou (Jira)" <ji...@apache.org> on 2020/04/09 12:07:00 UTC
[jira] [Assigned] (HIVE-23165) Hive On Spark left join and right
join generated inconsistent data
[ https://issues.apache.org/jira/browse/HIVE-23165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
qingfa zhou reassigned HIVE-23165:
----------------------------------
> Hive On Spark left join and right join generated inconsistent data
> -------------------------------------------------------------------
>
> Key: HIVE-23165
> URL: https://issues.apache.org/jira/browse/HIVE-23165
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.2.0
> Environment: hive :2.3.0
> spark:2.2.0
> hadoop:2.7.3
> Reporter: qingfa zhou
> Assignee: Xuefu Zhang
> Priority: Major
> Labels: newbie
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> *1)This is my sql.*
> with delivery_day as (
> select * from (
> select dt,warehouse_code,b.sku_main_code,b.out_warehouse_code,b.is_pici_order
> from data_smartorder.dm_ordering_information_system_order_detail_parse t
> lateral view json_tuple(t.information_info,'warehouse_code','sku_main_code','调出仓','是否预付商品')b as warehouse_code,sku_main_code,out_warehouse_code,is_pici_order
> where dt=date_format(date_sub(current_date,1),'yyyyMMdd')
> and l1_category_name='策略配置'
> and l2_category_name='pb仓库补货仓品维度新'
> and b.is_pici_order='1'
> )t
> ),
> avg_sale_7 as (
> select *,sku_sale_quantity+first_dilivery_quantity as avg_sale_7
> from (
> select t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name sku_main_name,
> sum(t1.warehouse_dispatch_quantity) as warehouse_dispatch_quantity,
> sum(t1.sku_sale_quantity) as sku_sale_quantity,
> sum(t1.first_dilivery_quantity) as first_dilivery_quantity
> from data_smartorder.dw_ordering_warehouse_sku_cargo_delivery_data_di t1
> where t1.dt=date_format(date_sub(current_date,1),'yyyyMMdd')
> group by t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name
> )t
> )
> select t1.warehouse_code,t1.sku_main_code,t1.out_warehouse_code,
> t2.avg_sale_7
> from delivery_day t1
> left join avg_sale_7 t2
> on t1.warehouse_code=t2.warehouse_code
> and t1.sku_main_code=t2.sku_main_code
> where t1.sku_main_code='37010832'
> and t1.out_warehouse_code='1011';
> left join and right join generated inconsistent data.
> 2) result in the left join
> 7001 37010832 1011 26.8572
> 1011 37010832 1011 130.2858
> 2002 37010832 1011 40
> 1701 37010832 1011 NULL
> 3) result in the right join
> 1011 37010832 1011 65.1429
> 2002 37010832 1011 20
> 7001 37010832 1011 13.4286
> Inconsistent results in last column,'right join' 's result is right.But the results of hive on tez and sparksql are consistent and is true.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)