You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by guanyq <dl...@163.com> on 2022/03/05 01:46:46 UTC

flink1.14.0 temporal join hive

kafka实时流关联hive的最新分区表数据时,关于缓存刷新的问题


'streaming-source.monitor-interval'='12 h'
这个参数我理解是:按照启动开始时间算起,每12小时读取一下最新分区的数据是吧?
还有个问题是读取最新分区的时间间隔之间,实时流里面进入了预关联新分区的数据,那么是不是就相当于关联的还是上一次的最新分区数据吧?


SETtable.sql-dialect=hive;CREATETABLEdimension_table(product_idSTRING,product_nameSTRING,unit_priceDECIMAL(10,4),pv_countBIGINT,like_countBIGINT,comment_countBIGINT,update_timeTIMESTAMP(3),update_userSTRING,...)PARTITIONEDBY(pt_yearSTRING,pt_monthSTRING,pt_daySTRING)TBLPROPERTIES(-- using default partition-name order to load the latest partition every 12h (the most recommended and convenient way)
'streaming-source.enable'='true','streaming-source.partition.include'='latest','streaming-source.monitor-interval'='12 h','streaming-source.partition-order'='partition-name',-- option with default value, can be ignored.

 




 





 





 





 

Re:flink1.14.0 temporal join hive

Posted by mack143 <ma...@163.com>.
退订
在 2022-03-05 09:46:46,"guanyq" <dl...@163.com> 写道:
>kafka实时流关联hive的最新分区表数据时,关于缓存刷新的问题
>
>
>'streaming-source.monitor-interval'='12 h'
>这个参数我理解是:按照启动开始时间算起,每12小时读取一下最新分区的数据是吧?
>还有个问题是读取最新分区的时间间隔之间,实时流里面进入了预关联新分区的数据,那么是不是就相当于关联的还是上一次的最新分区数据吧?
>
>
>SETtable.sql-dialect=hive;CREATETABLEdimension_table(product_idSTRING,product_nameSTRING,unit_priceDECIMAL(10,4),pv_countBIGINT,like_countBIGINT,comment_countBIGINT,update_timeTIMESTAMP(3),update_userSTRING,...)PARTITIONEDBY(pt_yearSTRING,pt_monthSTRING,pt_daySTRING)TBLPROPERTIES(-- using default partition-name order to load the latest partition every 12h (the most recommended and convenient way)
>'streaming-source.enable'='true','streaming-source.partition.include'='latest','streaming-source.monitor-interval'='12 h','streaming-source.partition-order'='partition-name',-- option with default value, can be ignored.
>
> 
>
>
>
>
> 
>
>
>
>
>
> 
>
>
>
>
>
> 
>
>
>
>
>
>