You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Rafi Aroch <ra...@walkme.com> on 2017/12/26 12:51:25 UTC

CEP: Dynamic Patterns

Hi,

I'm Rafi, Data Architect at WalkMe.

Our Mobile platform generates events coming from the end-users mobile
device for different actions that the user does.

The use case I wanted to implement using Flink CEP is as follows:

We would like to expose a UI where we could define a set of rules. Rules
can be statefull, like:

User did X 4 times in the last hour
AND
User did Y and then did Z in a session
AND
User average session duration is > 60 seconds

As the set of rules are met, we would like to trigger an action. Like a
REST call, fire event, etc.

This sounds like a good fit for Flink CEP, except that currently, I
understand that CEP patterns have to be "hard-coded" in my jobs code in
order to build the graph.
This set of rules may change many times a day. So re-deploying a Flink job
is not an option (or is it?).

I found this ticket: https://issues.apache.org/jira/browse/FLINK-7129 and
was hoping you plan to add this feature soon :)

This would make a powerful feature and open up many interesting use-cases.

Meanwhile, can you suggest a way of implementing this use-case?

Hope this makes sense.
Would love to hear your thoughts.

Thanks,
Rafi

Re: CEP: Dynamic Patterns

Posted by Rafi Aroch <ra...@walkme.com>.
Hi Kostas & Ufuk,

Appreciate your detailed response. I'll give it a try.

Thanks,
Rafi

On Thu, Dec 28, 2017 at 5:16 PM, Ufuk Celebi <uc...@apache.org> wrote:

> Looks good to me, Klou. I would only add the following two things:
>
> 1) UIDs need to be unique for the complete program, so you would need
> more specific ones, e.g. `filter-1-version-1` instead of only
> `version-1`. These are used to map state from the savepoint back to
> your program. We want to keep the state of those patterns that are not
> changed, but want to get rid of patterns that are updated with this
> approach. Note that new patterns will start from scratch whereas kept
> patterns will continue from whey they left off (e.g. if things already
> partially matched, this will be kept).
>
> 2) The flag has been renamed to
> ```
> --allowNonRestoredState
> ``
>
> – Ufuk
>
>
> On Thu, Dec 28, 2017 at 3:48 PM, Kostas Kloudas
> <k....@data-artisans.com> wrote:
> > Hi again Rafi,
> >
> > Coming back to the second part of the question on what you can do right
> now,
> > I would suggest that you launch your initial job with your initial
> patterns
> > and
> > in your code you assign UID’s to your sources and the CEP operators,
> e.g.:
> >
> > CEP.pattern(input, mySuperPattern).select(muSelectFunction).uid(“
> version1")
> >
> > When you want to update the pattern (delete, add a new one or update an
> > existing one):
> >
> > 1) take a savepoint from your running job and kill it.
> > 2) update your patterns, and on the updated patterns only, change the
> uids
> > so that the
> >     state in the savepoint corresponding to your previous version is
> > ignored.
> > 3) restart your job from the savepoint with the “—ignoreUnmappedState”
> > flag, as described here:
> >     https://issues.apache.org/jira/browse/FLINK-4445
> >
> > This will allow your job to restart from where it left off (the sources
> > checkpoint their state),
> > unchanged patterns to continue from where they were (as you did not
> change
> > their uids)
> > and new/updated patterns to start from scratch as they have a new uid.
> >
> > I have not really tried it but I think it can work although it requires
> some
> > manual stopping/
> > restarting.
> >
> > I do not know if Ufuk has something to add to this.
> >
> > Hope this helps,
> > Kostas
> >
> >
> > On Dec 28, 2017, at 2:57 PM, Kostas Kloudas <k.kloudas@data-artisans.com
> >
> > wrote:
> >
> > Hi Rafi,
> >
> > Currently this is unfortunately not supported out of the box.
> > To support this, we need 2 features with one having to be added in Flink
> > itself,
> > and the other to the CEP library.
> >
> > The first one is broadcast state and the ability to connect keyed and
> > non-keyed
> > streams. This one is to be added to Flink itself and the good news are
> that
> > this
> > feature is scheduled to be added to Flink 1.5.
> >
> > The second feature is to modify the CEP operator so that it can support
> > multiple
> > patterns and match incoming events against all of them. For this I have
> no
> > clear
> > deadline in my mind, but given that there are more and more people asking
> > for
> > it, I think it is going to be added soon.
> >
> > Thanks for raising the issue in the mailing list,
> > Kostas
> >
> > On Dec 27, 2017, at 2:44 PM, Ufuk Celebi <uc...@apache.org> wrote:
> >
> > Hey Rafi,
> >
> > this is indeed a very nice feature to have. :-) I'm afraid that this
> > is currently hard to do manually with CEP. Let me pull in Dawid and
> > Klou (cc'd) who have worked a lot on CEP. They can probably update you
> > on the plan for FLINK-7129.
> >
> > Best,
> >
> > Ufuk
> >
> >
> > On Tue, Dec 26, 2017 at 8:47 PM, Shivam Sharma <28shivamsharma@gmail.com
> >
> > wrote:
> >
> > Hi Rafi,
> >
> > Even I also wanted this facility from Flink Core. But I think this is
> > already solved by Uber on Flink.
> >
> > https://eng.uber.com/athenax/
> >
> > Best
> >
> > On Tue, Dec 26, 2017 at 6:21 PM, Rafi Aroch <ra...@walkme.com> wrote:
> >
> > Hi,
> >
> > I'm Rafi, Data Architect at WalkMe.
> >
> > Our Mobile platform generates events coming from the end-users mobile
> > device for different actions that the user does.
> >
> > The use case I wanted to implement using Flink CEP is as follows:
> >
> > We would like to expose a UI where we could define a set of rules. Rules
> > can be statefull, like:
> >
> > User did X 4 times in the last hour
> > AND
> > User did Y and then did Z in a session
> > AND
> > User average session duration is > 60 seconds
> >
> > As the set of rules are met, we would like to trigger an action. Like a
> > REST call, fire event, etc.
> >
> > This sounds like a good fit for Flink CEP, except that currently, I
> > understand that CEP patterns have to be "hard-coded" in my jobs code in
> > order to build the graph.
> > This set of rules may change many times a day. So re-deploying a Flink
> job
> > is not an option (or is it?).
> >
> > I found this ticket: https://issues.apache.org/jira/browse/FLINK-7129
> and
> > was hoping you plan to add this feature soon :)
> >
> > This would make a powerful feature and open up many interesting
> use-cases.
> >
> > Meanwhile, can you suggest a way of implementing this use-case?
> >
> > Hope this makes sense.
> > Would love to hear your thoughts.
> >
> > Thanks,
> > Rafi
> >
> >
> >
> >
> > --
> > Shivam Sharma
> > Data Engineer @ Goibibo
> > Indian Institute Of Information Technology, Design and Manufacturing
> > Jabalpur
> > Mobile No- (+91) 8882114744
> > Email:- 28shivamsharma@gmail.com
> > LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
> > <https://www.linkedin.com/in/28shivamsharma>*
> >
> >
> >
>

Re: CEP: Dynamic Patterns

Posted by Ufuk Celebi <uc...@apache.org>.
Looks good to me, Klou. I would only add the following two things:

1) UIDs need to be unique for the complete program, so you would need
more specific ones, e.g. `filter-1-version-1` instead of only
`version-1`. These are used to map state from the savepoint back to
your program. We want to keep the state of those patterns that are not
changed, but want to get rid of patterns that are updated with this
approach. Note that new patterns will start from scratch whereas kept
patterns will continue from whey they left off (e.g. if things already
partially matched, this will be kept).

2) The flag has been renamed to
```
--allowNonRestoredState
``

– Ufuk


On Thu, Dec 28, 2017 at 3:48 PM, Kostas Kloudas
<k....@data-artisans.com> wrote:
> Hi again Rafi,
>
> Coming back to the second part of the question on what you can do right now,
> I would suggest that you launch your initial job with your initial patterns
> and
> in your code you assign UID’s to your sources and the CEP operators, e.g.:
>
> CEP.pattern(input, mySuperPattern).select(muSelectFunction).uid(“version1")
>
> When you want to update the pattern (delete, add a new one or update an
> existing one):
>
> 1) take a savepoint from your running job and kill it.
> 2) update your patterns, and on the updated patterns only, change the uids
> so that the
>     state in the savepoint corresponding to your previous version is
> ignored.
> 3) restart your job from the savepoint with the “—ignoreUnmappedState”
> flag, as described here:
>     https://issues.apache.org/jira/browse/FLINK-4445
>
> This will allow your job to restart from where it left off (the sources
> checkpoint their state),
> unchanged patterns to continue from where they were (as you did not change
> their uids)
> and new/updated patterns to start from scratch as they have a new uid.
>
> I have not really tried it but I think it can work although it requires some
> manual stopping/
> restarting.
>
> I do not know if Ufuk has something to add to this.
>
> Hope this helps,
> Kostas
>
>
> On Dec 28, 2017, at 2:57 PM, Kostas Kloudas <k....@data-artisans.com>
> wrote:
>
> Hi Rafi,
>
> Currently this is unfortunately not supported out of the box.
> To support this, we need 2 features with one having to be added in Flink
> itself,
> and the other to the CEP library.
>
> The first one is broadcast state and the ability to connect keyed and
> non-keyed
> streams. This one is to be added to Flink itself and the good news are that
> this
> feature is scheduled to be added to Flink 1.5.
>
> The second feature is to modify the CEP operator so that it can support
> multiple
> patterns and match incoming events against all of them. For this I have no
> clear
> deadline in my mind, but given that there are more and more people asking
> for
> it, I think it is going to be added soon.
>
> Thanks for raising the issue in the mailing list,
> Kostas
>
> On Dec 27, 2017, at 2:44 PM, Ufuk Celebi <uc...@apache.org> wrote:
>
> Hey Rafi,
>
> this is indeed a very nice feature to have. :-) I'm afraid that this
> is currently hard to do manually with CEP. Let me pull in Dawid and
> Klou (cc'd) who have worked a lot on CEP. They can probably update you
> on the plan for FLINK-7129.
>
> Best,
>
> Ufuk
>
>
> On Tue, Dec 26, 2017 at 8:47 PM, Shivam Sharma <28...@gmail.com>
> wrote:
>
> Hi Rafi,
>
> Even I also wanted this facility from Flink Core. But I think this is
> already solved by Uber on Flink.
>
> https://eng.uber.com/athenax/
>
> Best
>
> On Tue, Dec 26, 2017 at 6:21 PM, Rafi Aroch <ra...@walkme.com> wrote:
>
> Hi,
>
> I'm Rafi, Data Architect at WalkMe.
>
> Our Mobile platform generates events coming from the end-users mobile
> device for different actions that the user does.
>
> The use case I wanted to implement using Flink CEP is as follows:
>
> We would like to expose a UI where we could define a set of rules. Rules
> can be statefull, like:
>
> User did X 4 times in the last hour
> AND
> User did Y and then did Z in a session
> AND
> User average session duration is > 60 seconds
>
> As the set of rules are met, we would like to trigger an action. Like a
> REST call, fire event, etc.
>
> This sounds like a good fit for Flink CEP, except that currently, I
> understand that CEP patterns have to be "hard-coded" in my jobs code in
> order to build the graph.
> This set of rules may change many times a day. So re-deploying a Flink job
> is not an option (or is it?).
>
> I found this ticket: https://issues.apache.org/jira/browse/FLINK-7129 and
> was hoping you plan to add this feature soon :)
>
> This would make a powerful feature and open up many interesting use-cases.
>
> Meanwhile, can you suggest a way of implementing this use-case?
>
> Hope this makes sense.
> Would love to hear your thoughts.
>
> Thanks,
> Rafi
>
>
>
>
> --
> Shivam Sharma
> Data Engineer @ Goibibo
> Indian Institute Of Information Technology, Design and Manufacturing
> Jabalpur
> Mobile No- (+91) 8882114744
> Email:- 28shivamsharma@gmail.com
> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
> <https://www.linkedin.com/in/28shivamsharma>*
>
>
>

Re: CEP: Dynamic Patterns

Posted by Kostas Kloudas <k....@data-artisans.com>.
Hi again Rafi,

Coming back to the second part of the question on what you can do right now,
I would suggest that you launch your initial job with your initial patterns and 
in your code you assign UID’s to your sources and the CEP operators, e.g.:

CEP.pattern(input, mySuperPattern).select(muSelectFunction).uid(“version1")
When you want to update the pattern (delete, add a new one or update an 
existing one):

1) take a savepoint from your running job and kill it.
2) update your patterns, and on the updated patterns only, change the uids so that the 
    state in the savepoint corresponding to your previous version is ignored.
3) restart your job from the savepoint with the “—ignoreUnmappedState”  flag, as described here:
    https://issues.apache.org/jira/browse/FLINK-4445 <https://issues.apache.org/jira/browse/FLINK-4445>

This will allow your job to restart from where it left off (the sources checkpoint their state),
unchanged patterns to continue from where they were (as you did not change their uids)
and new/updated patterns to start from scratch as they have a new uid.

I have not really tried it but I think it can work although it requires some manual stopping/
restarting.

I do not know if Ufuk has something to add to this.

Hope this helps,
Kostas


> On Dec 28, 2017, at 2:57 PM, Kostas Kloudas <k....@data-artisans.com> wrote:
> 
> Hi Rafi,
> 
> Currently this is unfortunately not supported out of the box.
> To support this, we need 2 features with one having to be added in Flink itself,
> and the other to the CEP library.
> 
> The first one is broadcast state and the ability to connect keyed and non-keyed 
> streams. This one is to be added to Flink itself and the good news are that this 
> feature is scheduled to be added to Flink 1.5.
> 
> The second feature is to modify the CEP operator so that it can support multiple 
> patterns and match incoming events against all of them. For this I have no clear 
> deadline in my mind, but given that there are more and more people asking for 
> it, I think it is going to be added soon.
> 
> Thanks for raising the issue in the mailing list,
> Kostas
> 
>> On Dec 27, 2017, at 2:44 PM, Ufuk Celebi <uce@apache.org <ma...@apache.org>> wrote:
>> 
>> Hey Rafi,
>> 
>> this is indeed a very nice feature to have. :-) I'm afraid that this
>> is currently hard to do manually with CEP. Let me pull in Dawid and
>> Klou (cc'd) who have worked a lot on CEP. They can probably update you
>> on the plan for FLINK-7129.
>> 
>> Best,
>> 
>> Ufuk
>> 
>> 
>> On Tue, Dec 26, 2017 at 8:47 PM, Shivam Sharma <28shivamsharma@gmail.com <ma...@gmail.com>> wrote:
>>> Hi Rafi,
>>> 
>>> Even I also wanted this facility from Flink Core. But I think this is
>>> already solved by Uber on Flink.
>>> 
>>> https://eng.uber.com/athenax/ <https://eng.uber.com/athenax/>
>>> 
>>> Best
>>> 
>>> On Tue, Dec 26, 2017 at 6:21 PM, Rafi Aroch <ra...@walkme.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm Rafi, Data Architect at WalkMe.
>>>> 
>>>> Our Mobile platform generates events coming from the end-users mobile
>>>> device for different actions that the user does.
>>>> 
>>>> The use case I wanted to implement using Flink CEP is as follows:
>>>> 
>>>> We would like to expose a UI where we could define a set of rules. Rules
>>>> can be statefull, like:
>>>> 
>>>> User did X 4 times in the last hour
>>>> AND
>>>> User did Y and then did Z in a session
>>>> AND
>>>> User average session duration is > 60 seconds
>>>> 
>>>> As the set of rules are met, we would like to trigger an action. Like a
>>>> REST call, fire event, etc.
>>>> 
>>>> This sounds like a good fit for Flink CEP, except that currently, I
>>>> understand that CEP patterns have to be "hard-coded" in my jobs code in
>>>> order to build the graph.
>>>> This set of rules may change many times a day. So re-deploying a Flink job
>>>> is not an option (or is it?).
>>>> 
>>>> I found this ticket: https://issues.apache.org/jira/browse/FLINK-7129 and
>>>> was hoping you plan to add this feature soon :)
>>>> 
>>>> This would make a powerful feature and open up many interesting use-cases.
>>>> 
>>>> Meanwhile, can you suggest a way of implementing this use-case?
>>>> 
>>>> Hope this makes sense.
>>>> Would love to hear your thoughts.
>>>> 
>>>> Thanks,
>>>> Rafi
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Shivam Sharma
>>> Data Engineer @ Goibibo
>>> Indian Institute Of Information Technology, Design and Manufacturing
>>> Jabalpur
>>> Mobile No- (+91) 8882114744
>>> Email:- 28shivamsharma@gmail.com
>>> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
>>> <https://www.linkedin.com/in/28shivamsharma>*
> 


Re: CEP: Dynamic Patterns

Posted by Kostas Kloudas <k....@data-artisans.com>.
Hi Rafi,

Currently this is unfortunately not supported out of the box.
To support this, we need 2 features with one having to be added in Flink itself,
and the other to the CEP library.

The first one is broadcast state and the ability to connect keyed and non-keyed 
streams. This one is to be added to Flink itself and the good news are that this 
feature is scheduled to be added to Flink 1.5.

The second feature is to modify the CEP operator so that it can support multiple 
patterns and match incoming events against all of them. For this I have no clear 
deadline in my mind, but given that there are more and more people asking for 
it, I think it is going to be added soon.

Thanks for raising the issue in the mailing list,
Kostas

> On Dec 27, 2017, at 2:44 PM, Ufuk Celebi <uc...@apache.org> wrote:
> 
> Hey Rafi,
> 
> this is indeed a very nice feature to have. :-) I'm afraid that this
> is currently hard to do manually with CEP. Let me pull in Dawid and
> Klou (cc'd) who have worked a lot on CEP. They can probably update you
> on the plan for FLINK-7129.
> 
> Best,
> 
> Ufuk
> 
> 
> On Tue, Dec 26, 2017 at 8:47 PM, Shivam Sharma <28...@gmail.com> wrote:
>> Hi Rafi,
>> 
>> Even I also wanted this facility from Flink Core. But I think this is
>> already solved by Uber on Flink.
>> 
>> https://eng.uber.com/athenax/
>> 
>> Best
>> 
>> On Tue, Dec 26, 2017 at 6:21 PM, Rafi Aroch <ra...@walkme.com> wrote:
>> 
>>> Hi,
>>> 
>>> I'm Rafi, Data Architect at WalkMe.
>>> 
>>> Our Mobile platform generates events coming from the end-users mobile
>>> device for different actions that the user does.
>>> 
>>> The use case I wanted to implement using Flink CEP is as follows:
>>> 
>>> We would like to expose a UI where we could define a set of rules. Rules
>>> can be statefull, like:
>>> 
>>> User did X 4 times in the last hour
>>> AND
>>> User did Y and then did Z in a session
>>> AND
>>> User average session duration is > 60 seconds
>>> 
>>> As the set of rules are met, we would like to trigger an action. Like a
>>> REST call, fire event, etc.
>>> 
>>> This sounds like a good fit for Flink CEP, except that currently, I
>>> understand that CEP patterns have to be "hard-coded" in my jobs code in
>>> order to build the graph.
>>> This set of rules may change many times a day. So re-deploying a Flink job
>>> is not an option (or is it?).
>>> 
>>> I found this ticket: https://issues.apache.org/jira/browse/FLINK-7129 and
>>> was hoping you plan to add this feature soon :)
>>> 
>>> This would make a powerful feature and open up many interesting use-cases.
>>> 
>>> Meanwhile, can you suggest a way of implementing this use-case?
>>> 
>>> Hope this makes sense.
>>> Would love to hear your thoughts.
>>> 
>>> Thanks,
>>> Rafi
>>> 
>> 
>> 
>> 
>> --
>> Shivam Sharma
>> Data Engineer @ Goibibo
>> Indian Institute Of Information Technology, Design and Manufacturing
>> Jabalpur
>> Mobile No- (+91) 8882114744
>> Email:- 28shivamsharma@gmail.com
>> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
>> <https://www.linkedin.com/in/28shivamsharma>*


Re: CEP: Dynamic Patterns

Posted by Ufuk Celebi <uc...@apache.org>.
Hey Rafi,

this is indeed a very nice feature to have. :-) I'm afraid that this
is currently hard to do manually with CEP. Let me pull in Dawid and
Klou (cc'd) who have worked a lot on CEP. They can probably update you
on the plan for FLINK-7129.

Best,

Ufuk


On Tue, Dec 26, 2017 at 8:47 PM, Shivam Sharma <28...@gmail.com> wrote:
> Hi Rafi,
>
> Even I also wanted this facility from Flink Core. But I think this is
> already solved by Uber on Flink.
>
> https://eng.uber.com/athenax/
>
> Best
>
> On Tue, Dec 26, 2017 at 6:21 PM, Rafi Aroch <ra...@walkme.com> wrote:
>
>> Hi,
>>
>> I'm Rafi, Data Architect at WalkMe.
>>
>> Our Mobile platform generates events coming from the end-users mobile
>> device for different actions that the user does.
>>
>> The use case I wanted to implement using Flink CEP is as follows:
>>
>> We would like to expose a UI where we could define a set of rules. Rules
>> can be statefull, like:
>>
>> User did X 4 times in the last hour
>> AND
>> User did Y and then did Z in a session
>> AND
>> User average session duration is > 60 seconds
>>
>> As the set of rules are met, we would like to trigger an action. Like a
>> REST call, fire event, etc.
>>
>> This sounds like a good fit for Flink CEP, except that currently, I
>> understand that CEP patterns have to be "hard-coded" in my jobs code in
>> order to build the graph.
>> This set of rules may change many times a day. So re-deploying a Flink job
>> is not an option (or is it?).
>>
>> I found this ticket: https://issues.apache.org/jira/browse/FLINK-7129 and
>> was hoping you plan to add this feature soon :)
>>
>> This would make a powerful feature and open up many interesting use-cases.
>>
>> Meanwhile, can you suggest a way of implementing this use-case?
>>
>> Hope this makes sense.
>> Would love to hear your thoughts.
>>
>> Thanks,
>> Rafi
>>
>
>
>
> --
> Shivam Sharma
> Data Engineer @ Goibibo
> Indian Institute Of Information Technology, Design and Manufacturing
> Jabalpur
> Mobile No- (+91) 8882114744
> Email:- 28shivamsharma@gmail.com
> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
> <https://www.linkedin.com/in/28shivamsharma>*

Re: CEP: Dynamic Patterns

Posted by Shivam Sharma <28...@gmail.com>.
Hi Rafi,

Even I also wanted this facility from Flink Core. But I think this is
already solved by Uber on Flink.

https://eng.uber.com/athenax/

Best

On Tue, Dec 26, 2017 at 6:21 PM, Rafi Aroch <ra...@walkme.com> wrote:

> Hi,
>
> I'm Rafi, Data Architect at WalkMe.
>
> Our Mobile platform generates events coming from the end-users mobile
> device for different actions that the user does.
>
> The use case I wanted to implement using Flink CEP is as follows:
>
> We would like to expose a UI where we could define a set of rules. Rules
> can be statefull, like:
>
> User did X 4 times in the last hour
> AND
> User did Y and then did Z in a session
> AND
> User average session duration is > 60 seconds
>
> As the set of rules are met, we would like to trigger an action. Like a
> REST call, fire event, etc.
>
> This sounds like a good fit for Flink CEP, except that currently, I
> understand that CEP patterns have to be "hard-coded" in my jobs code in
> order to build the graph.
> This set of rules may change many times a day. So re-deploying a Flink job
> is not an option (or is it?).
>
> I found this ticket: https://issues.apache.org/jira/browse/FLINK-7129 and
> was hoping you plan to add this feature soon :)
>
> This would make a powerful feature and open up many interesting use-cases.
>
> Meanwhile, can you suggest a way of implementing this use-case?
>
> Hope this makes sense.
> Would love to hear your thoughts.
>
> Thanks,
> Rafi
>



-- 
Shivam Sharma
Data Engineer @ Goibibo
Indian Institute Of Information Technology, Design and Manufacturing
Jabalpur
Mobile No- (+91) 8882114744
Email:- 28shivamsharma@gmail.com
LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
<https://www.linkedin.com/in/28shivamsharma>*