You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "qingfa zhou (Jira)" <ji...@apache.org> on 2020/04/09 12:07:00 UTC

[jira] [Assigned] (HIVE-23165) Hive On Spark left join and right join generated inconsistent data

     [ https://issues.apache.org/jira/browse/HIVE-23165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

qingfa zhou reassigned HIVE-23165:
----------------------------------


>  Hive On Spark left join and right join generated inconsistent data
> -------------------------------------------------------------------
>
>                 Key: HIVE-23165
>                 URL: https://issues.apache.org/jira/browse/HIVE-23165
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.2.0
>         Environment: hive :2.3.0
> spark:2.2.0
> hadoop:2.7.3
>            Reporter: qingfa zhou
>            Assignee: Xuefu Zhang
>            Priority: Major
>              Labels: newbie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> *1)This is my sql.*
> with delivery_day as (
>  select * from (
>  select dt,warehouse_code,b.sku_main_code,b.out_warehouse_code,b.is_pici_order
>  from data_smartorder.dm_ordering_information_system_order_detail_parse t
>  lateral view json_tuple(t.information_info,'warehouse_code','sku_main_code','调出仓','是否预付商品')b as warehouse_code,sku_main_code,out_warehouse_code,is_pici_order
>  where dt=date_format(date_sub(current_date,1),'yyyyMMdd')
>  and l1_category_name='策略配置'
>  and l2_category_name='pb仓库补货仓品维度新'
>  and b.is_pici_order='1'
>  )t
> ),
> avg_sale_7 as (
>  select *,sku_sale_quantity+first_dilivery_quantity as avg_sale_7
>  from (
>  select t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name sku_main_name,
>  sum(t1.warehouse_dispatch_quantity) as warehouse_dispatch_quantity,
>  sum(t1.sku_sale_quantity) as sku_sale_quantity,
>  sum(t1.first_dilivery_quantity) as first_dilivery_quantity
>  from data_smartorder.dw_ordering_warehouse_sku_cargo_delivery_data_di t1
>  where t1.dt=date_format(date_sub(current_date,1),'yyyyMMdd')
>  group by t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name
>  )t
> )
>  select t1.warehouse_code,t1.sku_main_code,t1.out_warehouse_code,
>  t2.avg_sale_7
>  from delivery_day t1
>  left join avg_sale_7 t2
>  on t1.warehouse_code=t2.warehouse_code
>  and t1.sku_main_code=t2.sku_main_code
>  where t1.sku_main_code='37010832'
>  and t1.out_warehouse_code='1011';
> left join and right join generated inconsistent data.
> 2) result in the left join 
> 7001  37010832  1011  26.8572
> 1011  37010832  1011  130.2858
> 2002  37010832  1011  40
> 1701  37010832  1011  NULL
> 3) result in the right join 
> 1011  37010832  1011  65.1429
> 2002  37010832  1011  20
> 7001  37010832  1011  13.4286
> Inconsistent results in last column,'right join' 's result is right.But the results of hive on tez and sparksql are consistent and is true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)