You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-zh@flink.apache.org by 史正超 <79...@qq.com> on 2020/11/04 11:27:57 UTC

回复: union all 丢失部分数据

你的union all上面，也就是insert into 的第一条select 末尾有 分号 ‘;’,， 这样的话我感觉第二条select是不会执行的。还有另一个问题是你把分号去掉，我感觉数据会被 覆盖的。因为我最近也有使用union all的场景。
我觉得你应该这样组装 你的sql :
```sql

Insert into xxx
Select
       d1,
       d2,
       count(1)
From (
       Select * from a
       Union all
       Select * from b,
)
Group by d1, d2

```

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用

发件人: 夜思流年梦<ma...@163.com>
发送时间: 2020年11月4日 18:21
收件人: user-zh@flink.apache.org<ma...@flink.apache.org>
主题: union all 丢失部分数据

开发者好：
               目前有此场景：算不同部门的总收入和所有部门的总收入，打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据，要么是各个部门的数据少了，要么是所有部门的总收入少了
               如果把union  all 的两段SQL 分别独立出来，插入同一张表，那么数据就是正常的，不知道是否是bug还是使用方法不对




原sql :


insert into dws_XXXX


select
0 as id
,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
,case
when dept_name like '%XX%' then 'X1'
when dept_name = 'xXX' then 'X2'
else 'X3' end as paytype
,count(orderid) as paynum_h
,round(sum(amt)) as paymoney_h
from dwd_XXX
where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
group by
DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
case
when dept_name like '%XX%' then 'X1'
when dept_name = 'xXX' then 'X2'
else 'X3' end ;




union all




select 0 as id
,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
,'all' as paytype
,count(orderid) as paynum_h
,round(sum(amt)) as paymoney_h
from dwd_XXX
where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;

Re:回复: union all 丢失部分数据

Posted by 夜思流年梦 <li...@163.com>.






哦，不好意思，因为我把这条SQL是分成两段了，所以在恢复原SQL的时候没有把 分号去掉 ;


当时在union的时候，中间是不会有分号的，不然也提不上去











在 2020-11-05 10:00:01，"史 正超" <79...@qq.com> 写道：
>你的union all上面，也就是insert into 的第一条select 末尾有 分号 ‘;’,， 这样的话我感觉第二条select是不会执行的。还有另一个问题是你把分号去掉，我感觉数据会被 覆盖的。因为我最近也有使用union all的场景。
>我觉得你应该这样组装 你的sql :
>```sql
>
>Insert into xxx
>Select
>       d1,
>       d2,
>       count(1)
>From (
>       Select * from a
>       Union all
>       Select * from b,
>)
>Group by d1, d2
>
>```
>
>发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>发件人: 夜思流年梦<ma...@163.com>
>发送时间: 2020年11月4日 18:21
>收件人: user-zh@flink.apache.org<ma...@flink.apache.org>
>主题: union all 丢失部分数据
>
>开发者好：
>               目前有此场景：算不同部门的总收入和所有部门的总收入，打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据，要么是各个部门的数据少了，要么是所有部门的总收入少了
>               如果把union  all 的两段SQL 分别独立出来，插入同一张表，那么数据就是正常的，不知道是否是bug还是使用方法不对
>
>
>
>
>原sql :
>
>
>insert into dws_XXXX
>
>
>select
>0 as id
>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
>,case
>when dept_name like '%XX%' then 'X1'
>when dept_name = 'xXX' then 'X2'
>else 'X3' end as paytype
>,count(orderid) as paynum_h
>,round(sum(amt)) as paymoney_h
>from dwd_XXX
>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>group by
>DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
>case
>when dept_name like '%XX%' then 'X1'
>when dept_name = 'xXX' then 'X2'
>else 'X3' end ;
>
>
>
>
>union all
>
>
>
>
>select 0 as id
>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
>,'all' as paytype
>,count(orderid) as paynum_h
>,round(sum(amt)) as paymoney_h
>from dwd_XXX
>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;
>