You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-zh@flink.apache.org by macdoor <ma...@gmail.com> on 2021/01/30 09:54:37 UTC

是否可以 hive 流 join hive 流？

具体需求是这样，采集取得的通道总流量5分钟一次存入 hive 表，为了取得 5 分钟内该通道的流量，需要前后2次采集到的总流量相减，我想用同一个 hive
表自己相互 join，形成 2 个 hive 流 join，不知道是否可以实现？或者有其他实现方法吗？
我现在使用 crontab 定时 batch 模式做，希望能改成 stream 模式

select p1.traffic -p2.traffic
from p as p1
inner join p as p2 on p1.id=p2.id and p1.time=p2.time + interval 5 minutes



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 是否可以 hive 流 join hive 流？

Posted by yidan zhao <hi...@gmail.com>.

请问，hive表不支持watermark，是不是和window tvf不支持batch也有关系？
当前hive表如果要分窗口统计是不是不可以用window tvf，是否也是因为hive表不支持time
attribute（eventtime+watermark）的原因。

Leonard Xu <xb...@gmail.com> 于2021年2月1日周一 下午2:24写道：

> 还没有，你可以关注下这个issue[1]
>
> 祝好,
> Leonard
> [1] https://issues.apache.org/jira/browse/FLINK-21183
>
> > 在 2021年2月1日，13:29，macdoor <ma...@gmail.com> 写道：
> >
> > 当前的 1.13-snapshot 支持了吗？我可以试试吗？
> >
> >
> >
> > --
> > Sent from: http://apache-flink.147419.n8.nabble.com/
>
>

Re: 是否可以 hive 流 join hive 流？

Posted by Leonard Xu <xb...@gmail.com>.

还没有，你可以关注下这个issue[1]

祝好,
Leonard
[1] https://issues.apache.org/jira/browse/FLINK-21183

> 在 2021年2月1日，13:29，macdoor <ma...@gmail.com> 写道：
> 
> 当前的 1.13-snapshot 支持了吗？我可以试试吗？
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 是否可以 hive 流 join hive 流？

Posted by macdoor <ma...@gmail.com>.

当前的 1.13-snapshot 支持了吗？我可以试试吗？



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 是否可以 hive 流 join hive 流？

Posted by Leonard Xu <xb...@gmail.com>.

Okay, 和我理解的一样，这个时间上是 event time, 基于event time的 interval join 需要定义watermark，目前hive表还不支持定义watermark，1.13应该会支持。



> 在 2021年2月1日，10:58，macdoor <ma...@gmail.com> 写道：
> 
> p1.time 是数据记录里的时间，也用这个时间做分区
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 是否可以 hive 流 join hive 流？

Posted by macdoor <ma...@gmail.com>.

p1.time 是数据记录里的时间，也用这个时间做分区



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 是否可以 hive 流 join hive 流？

Posted by Leonard Xu <xb...@gmail.com>.

Hi，macdoor

很有意思的case，p1.time字段是你记录里的时间吗？ 你hive表的分区字段和这个时间字段的关系是怎么样的呀？


> 在 2021年1月30日，17:54，macdoor <ma...@gmail.com> 写道：
> 
> 具体需求是这样，采集取得的通道总流量5分钟一次存入 hive 表，为了取得 5 分钟内该通道的流量，需要前后2次采集到的总流量相减，我想用同一个 hive
> 表自己相互 join，形成 2 个 hive 流 join，不知道是否可以实现？或者有其他实现方法吗？
> 我现在使用 crontab 定时 batch 模式做，希望能改成 stream 模式
> 
> select p1.traffic -p2.traffic
> from p as p1
> inner join p as p2 on p1.id=p2.id and p1.time=p2.time + interval 5 minutes
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/