You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Johannes Schulte <jo...@gmail.com> on 2017/03/06 17:06:01 UTC

TTL for State Entries / FLINK-3089

Hi,

I am trying to achieve a stream-to-stream join with big windows and are
searching for a way to clean up state of old keys. I am already using a
RichCoProcessFunction

I found there is already an existing ticket

https://issues.apache.org/jira/browse/FLINK-3089

but I have doubts that a registration of a timer for every incoming event
is feasible as the timers seem to reside in an in-memory queue.

The task is somewhat similar to the following blog post:
http://devblog.mediamath.com/real-time-streaming-attribution-using-apache-flink

Is the implementation of a custom window operator a necessity for achieving
such functionality

Thanks a lot,

Johannes

Re: TTL for State Entries / FLINK-3089

Posted by Johannes Schulte <jo...@gmail.com>.
Hey Aljoscha,

thank you for your reply. The amount and quality of response on this list
are really great to see and a good way to learn.

I will try this and see how this works out.

Cheers,

Johannes

On Thu, Mar 9, 2017 at 3:55 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi Johannes,
> I think what you can do is not register a timer for every event but for
> every key, with a certain granularity. When that timer fires you check
> what you want to clean up for that key and maybe register another timer
> for the future. This way, the size of your timer state is bounded by
> your key cardinality and I think people have used Flink with
> timers/windows with key cardinalities of several 100 millions.
>
> Best,
> Aljoscha
>
> On Wed, Mar 8, 2017, at 14:37, Ufuk Celebi wrote:
> > Looping in Aljoscha and Kostas who are the expert on this. :-)
> >
> > On Mon, Mar 6, 2017 at 6:06 PM, Johannes Schulte
> > <jo...@gmail.com> wrote:
> > > Hi,
> > >
> > > I am trying to achieve a stream-to-stream join with big windows and are
> > > searching for a way to clean up state of old keys. I am already using a
> > > RichCoProcessFunction
> > >
> > > I found there is already an existing ticket
> > >
> > > https://issues.apache.org/jira/browse/FLINK-3089
> > >
> > > but I have doubts that a registration of a timer for every incoming
> event is
> > > feasible as the timers seem to reside in an in-memory queue.
> > >
> > > The task is somewhat similar to the following blog post:
> > > http://devblog.mediamath.com/real-time-streaming-
> attribution-using-apache-flink
> > >
> > > Is the implementation of a custom window operator a necessity for
> achieving
> > > such functionality
> > >
> > > Thanks a lot,
> > >
> > > Johannes
> > >
> > >
>

Re: TTL for State Entries / FLINK-3089

Posted by Aljoscha Krettek <al...@apache.org>.
Hi Johannes,
I think what you can do is not register a timer for every event but for
every key, with a certain granularity. When that timer fires you check
what you want to clean up for that key and maybe register another timer
for the future. This way, the size of your timer state is bounded by
your key cardinality and I think people have used Flink with
timers/windows with key cardinalities of several 100 millions.

Best,
Aljoscha

On Wed, Mar 8, 2017, at 14:37, Ufuk Celebi wrote:
> Looping in Aljoscha and Kostas who are the expert on this. :-)
> 
> On Mon, Mar 6, 2017 at 6:06 PM, Johannes Schulte
> <jo...@gmail.com> wrote:
> > Hi,
> >
> > I am trying to achieve a stream-to-stream join with big windows and are
> > searching for a way to clean up state of old keys. I am already using a
> > RichCoProcessFunction
> >
> > I found there is already an existing ticket
> >
> > https://issues.apache.org/jira/browse/FLINK-3089
> >
> > but I have doubts that a registration of a timer for every incoming event is
> > feasible as the timers seem to reside in an in-memory queue.
> >
> > The task is somewhat similar to the following blog post:
> > http://devblog.mediamath.com/real-time-streaming-attribution-using-apache-flink
> >
> > Is the implementation of a custom window operator a necessity for achieving
> > such functionality
> >
> > Thanks a lot,
> >
> > Johannes
> >
> >

Re: TTL for State Entries / FLINK-3089

Posted by Ufuk Celebi <uc...@apache.org>.
Looping in Aljoscha and Kostas who are the expert on this. :-)

On Mon, Mar 6, 2017 at 6:06 PM, Johannes Schulte
<jo...@gmail.com> wrote:
> Hi,
>
> I am trying to achieve a stream-to-stream join with big windows and are
> searching for a way to clean up state of old keys. I am already using a
> RichCoProcessFunction
>
> I found there is already an existing ticket
>
> https://issues.apache.org/jira/browse/FLINK-3089
>
> but I have doubts that a registration of a timer for every incoming event is
> feasible as the timers seem to reside in an in-memory queue.
>
> The task is somewhat similar to the following blog post:
> http://devblog.mediamath.com/real-time-streaming-attribution-using-apache-flink
>
> Is the implementation of a custom window operator a necessity for achieving
> such functionality
>
> Thanks a lot,
>
> Johannes
>
>