You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by David Koch <og...@googlemail.com> on 2016/09/15 16:13:55 UTC

FlinkCEP for large key spaces and long timeouts between events

Hello,

Is FlinkCEP applicable to large key spaces with potentially long timeouts
between events that define a pattern? Ideally, without ridiculous hardware.

More concretely, we segment users (one key per user) based on sequences of
events for that user.

A segment "Abandoned Cart" may be defined by adding items during a browsing
session but no purchase event within the following 5 days. The number of
users is between 1 and 10 million.

Is this type of segmentation scenario a viable use case for FlinkCEP?

We currently segment by building incremental profiles in ES which are then
"matched against segment definition queries" using ES percolators. In
short, we incur costs when interacting with ES.

Regards,

David


PS: Thanks for FlinkForward 2016, very interesting presentations and
equally important excellent catering ;-)

Re: FlinkCEP for large key spaces and long timeouts between events

Posted by Till Rohrmann <tr...@apache.org>.
Hi David,

you should be able to solve this kind of problem with Flink's CEP library.
The important thing here is to define a pattern interval length so that
patterns can time out. Otherwise, you will end up accumulating state which
is never purged. This will eventually cause an OOM exception.

How complex would a pattern be (how many stages, what kind of payload)?
Depending on this, we should be able to estimate the resource requirements.
Or you give it a try and see to how many machines you can minimize the
cluster.

Great to hear that you enjoyed the conference :-)

Cheers,
Till

On Thu, Sep 15, 2016 at 6:13 PM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> Is FlinkCEP applicable to large key spaces with potentially long timeouts
> between events that define a pattern? Ideally, without ridiculous hardware.
>
> More concretely, we segment users (one key per user) based on sequences of
> events for that user.
>
> A segment "Abandoned Cart" may be defined by adding items during a
> browsing session but no purchase event within the following 5 days. The
> number of users is between 1 and 10 million.
>
> Is this type of segmentation scenario a viable use case for FlinkCEP?
>
> We currently segment by building incremental profiles in ES which are then
> "matched against segment definition queries" using ES percolators. In
> short, we incur costs when interacting with ES.
>
> Regards,
>
> David
>
>
> PS: Thanks for FlinkForward 2016, very interesting presentations and
> equally important excellent catering ;-)
>