You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by JingsongLee <lz...@aliyun.com> on 2017/01/27 13:33:43 UTC

回复:How to implement Timer in runner

Thanks for the reply.Maybe we need a external priority queue.Happy Chinese New Year!
------------------------------------------------------------------发件人:Aljoscha Krettek <al...@apache.org>发送时间:2017年1月25日(星期三) 18:38收件人:dev <de...@beam.apache.org>; lzljs3620320 <lz...@aliyun.com>; Kenneth Knowles <kl...@google.com>主 题:Re: How to implement Timer in runner
Hi Jingsong,you're right, it is indeed somewhat tricky to find a good data structure for out-of-core timers. That's why we have them in memory in Flink for now and that's also why I'm afraid I don't have any good advice for you right now. We're aware of the problem in Flink but we're not yet working on a concrete solution.
Cheers,Aljoscha
On Tue, 24 Jan 2017 at 21:42 Dan Halperin <dh...@apache.org> wrote:
Hi Jingsong,

Sorry for the delayed response; this email ended up being misclassified by
my mail server and I missed it. Maybe Kenn or Aljoscha has suggestions on
how runners can best implement timers?

Dan

On Thu, Jan 19, 2017 at 9:55 PM, lzljs3620320 <lz...@aliyun.com>
wrote:

> Hi there,
> I'm working on the beam integration for an internal system at Alibaba. Now
> most of the runners put timers in memory, such as Flink, Apex, etc. (I do not know
> the implementation of Google Dataflow).But in our  scene, unbounded data
> has a large number of keys,which will lead to OOM(timers in memory). So
> we want to store timers in state(RocksDb in disk).The problem is how to
> extract fired event time timers when refresh the input
> watermark. Do we have to scan all keys and timers(Now timer is composed of
> Key, id, namespace, timestamp, domain)?Is there a better
> implement? I'm wondering if you could give me some advice on how to implement
> timers in state efficiently. Thank you!
> Best,Jingsong Lee


回复:How to implement Timer in runner

Posted by JingsongLee <lz...@aliyun.com>.
@小多 Thank you! I'll taka a look.------------------------------------------------------------------发件人:小多 <ki...@gmail.com>发送时间:2017年1月27日(星期五) 23:22收件人:dev <de...@beam.apache.org>; JingsongLee <lz...@aliyun.com>抄 送:Kenneth Knowles <kl...@google.com>; Aljoscha Krettek <al...@apache.org>主 题:Re: How to implement Timer in runner
Hi Jingsong,
Take a look at Blink (a fork of Flink at Alibaba Search), they had implemented a RocksDBPriorityState and used for timer.
On Fri, Jan 27, 2017 at 9:33 PM, JingsongLee <lz...@aliyun.com> wrote:
Thanks for the reply.Maybe we need a external priority queue.Happy Chinese New Year!

------------------------------------------------------------------发件人:Aljoscha Krettek <al...@apache.org>发送时间:2017年1月25日(星期三) 18:38收件人:dev <de...@beam.apache.org>; lzljs3620320 <lz...@aliyun.com>; Kenneth Knowles <kl...@google.com>主 题:Re: How to implement Timer in runner

Hi Jingsong,you're right, it is indeed somewhat tricky to find a good data structure for out-of-core timers. That's why we have them in memory in Flink for now and that's also why I'm afraid I don't have any good advice for you right now. We're aware of the problem in Flink but we're not yet working on a concrete solution.

Cheers,Aljoscha

On Tue, 24 Jan 2017 at 21:42 Dan Halperin <dh...@apache.org> wrote:

Hi Jingsong,



Sorry for the delayed response; this email ended up being misclassified by

my mail server and I missed it. Maybe Kenn or Aljoscha has suggestions on

how runners can best implement timers?



Dan



On Thu, Jan 19, 2017 at 9:55 PM, lzljs3620320 <lz...@aliyun.com>

wrote:



> Hi there,

> I'm working on the beam integration for an internal system at Alibaba. Now

> most of the runners put timers in memory, such as Flink, Apex, etc. (I do not know

> the implementation of Google Dataflow).But in our  scene, unbounded data

> has a large number of keys,which will lead to OOM(timers in memory). So

> we want to store timers in state(RocksDb in disk).The problem is how to

> extract fired event time timers when refresh the input

> watermark. Do we have to scan all keys and timers(Now timer is composed of

> Key, id, namespace, timestamp, domain)?Is there a better

> implement? I'm wondering if you could give me some advice on how to implement

> timers in state efficiently. Thank you!

> Best,Jingsong Lee






-- 

Engineer, Geek...



Re: How to implement Timer in runner

Posted by 小多 <ki...@gmail.com>.
Hi Jingsong,

Take a look at Blink (a fork of Flink at Alibaba Search), they had
implemented a RocksDBPriorityState and used for timer.

On Fri, Jan 27, 2017 at 9:33 PM, JingsongLee <lz...@aliyun.com>
wrote:

> Thanks for the reply.Maybe we need a external priority queue.Happy Chinese
> New Year!
> ------------------------------------------------------------------发件人:Aljoscha
> Krettek <al...@apache.org>发送时间:2017年1月25日(星期三) 18:38收件人:dev <
> dev@beam.apache.org>; lzljs3620320 <lz...@aliyun.com>; Kenneth
> Knowles <kl...@google.com>主 题:Re: How to implement Timer in runner
> Hi Jingsong,you're right, it is indeed somewhat tricky to find a good data
> structure for out-of-core timers. That's why we have them in memory in
> Flink for now and that's also why I'm afraid I don't have any good advice
> for you right now. We're aware of the problem in Flink but we're not yet
> working on a concrete solution.
> Cheers,Aljoscha
> On Tue, 24 Jan 2017 at 21:42 Dan Halperin <dh...@apache.org> wrote:
> Hi Jingsong,
>
> Sorry for the delayed response; this email ended up being misclassified by
> my mail server and I missed it. Maybe Kenn or Aljoscha has suggestions on
> how runners can best implement timers?
>
> Dan
>
> On Thu, Jan 19, 2017 at 9:55 PM, lzljs3620320 <lz...@aliyun.com>
> wrote:
>
> > Hi there,
> > I'm working on the beam integration for an internal system at Alibaba.
> Now
> > most of the runners put timers in memory, such as Flink, Apex, etc. (I
> do not know
> > the implementation of Google Dataflow).But in our  scene, unbounded data
> > has a large number of keys,which will lead to OOM(timers in memory). So
> > we want to store timers in state(RocksDb in disk).The problem is how to
> > extract fired event time timers when refresh the input
> > watermark. Do we have to scan all keys and timers(Now timer is composed
> of
> > Key, id, namespace, timestamp, domain)?Is there a better
> > implement? I'm wondering if you could give me some advice on how to
> implement
> > timers in state efficiently. Thank you!
> > Best,Jingsong Lee
>
>


-- 

Engineer, Geek...