You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-zh@flink.apache.org by 雨后彩虹 <29...@qq.com> on 2020/10/30 03:45:21 UTC

minibatch+状态ttl设置不生效问题如何优化

hi, all !
flink版本：1.9需求：统计用户每天的订单数据（订单可以修改，导致同一个订单会有条数据，最终根据时间排序取最新的一条来做统计）。
应用：select userId,sum(money) as result,ymd from (
select userId,order_id,money,DATE_FORMAT(trans_time,'yyyyMMdd') as ymd,row_number() over(partition by order_id order by last_modify_time desc) as rk from MyTable where type='1'
) t where t.rk = 1 group by userId,ymd;
配置：tableConfig.setIdleStateRetentionTime(Time.milliseconds(3600000), Time.milliseconds(390000)); --相当于设置了1小时的过期时间
现象：checkpoint的数据大小一直在增加（应该是ttl状态过期未生效）
问题：翻看了jira，发现有人已经提出了这个问题【1】，想问一下这个问题还有什么补救的措施吗？


【1】https://issues.apache.org/jira/browse/FLINK-17096

Re: minibatch+状态ttl设置不生效问题如何优化

Posted by 刘大龙 <ld...@zju.edu.cn>.

目前看1.9应该没有补救方案了，可以试试master分支，把这个PR merge进去，编译一下


> -----原始邮件-----
> 发件人: "雨后彩虹" <29...@qq.com>
> 发送时间: 2020-10-30 11:45:21 (星期五)
> 收件人: "user-zh@flink.apache.org" <us...@flink.apache.org>
> 抄送: 
> 主题: minibatch+状态ttl设置不生效问题如何优化
> 
> hi, all !
> flink版本：1.9需求：统计用户每天的订单数据（订单可以修改，导致同一个订单会有条数据，最终根据时间排序取最新的一条来做统计）。
> 应用：select userId,sum(money) as result,ymd from (
> select userId,order_id,money,DATE_FORMAT(trans_time,'yyyyMMdd') as ymd,row_number() over(partition by order_id order by last_modify_time desc) as rk from MyTable where type='1'
> ) t where t.rk = 1 group by userId,ymd;
> 配置：tableConfig.setIdleStateRetentionTime(Time.milliseconds(3600000), Time.milliseconds(390000)); --相当于设置了1小时的过期时间
> 现象：checkpoint的数据大小一直在增加（应该是ttl状态过期未生效）
> 问题：翻看了jira，发现有人已经提出了这个问题【1】，想问一下这个问题还有什么补救的措施吗？
> 
> 
> 【1】https://issues.apache.org/jira/browse/FLINK-17096