You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Люльченко Юрий Николаевич <un...@mail.ru> on 2021/02/20 07:23:51 UTC

Cep application with Flink

Hi there.
 
I’m a newby with Flink and I need to develop a CEP application using this technology. At least, we are choosing a tool now. So I have a question how Flink really fit to my goals.
 
Buisness requirements:
 
There is a mobile application. Data is collected in Kafka topics (which are have multi partitions) from it. Number of topics is three. And it is not a ready events by user. At first I need to join all data from topics and only after this we have a good clean event of user.
 
The order of events by each user is matter.
 
So the rules can be like these: User does action A, then B, and then for some period does not action event C. If such a sequence is recived, so the system communicate with this user by different chanels. 
 
We don't want to use an extarnal DB, but only Flink states.
 
My questions are:
 
1. Using Kafka as input stream of data and collect an event by user.
 
I think that the order of clean user events will be wrong with this way, because topics are not partitioned by user key and only one topic has this field. So can I reorder these events by time field of event?
 
2. State of events.
 
Can I query the state using SQL syntax? I don't want iterate all records of store to make a communication.
In case described above (A -> B -> x period waiting -> no C -> communication), the B event stored in state. If C recived the system cleans B in store. We need query the store and get all records B with B.event_time + period_waiting < now_time.
Or can the CEP library make this job by pattern?
 
3. May be the solve of these requrements are not correct anougth. But again does Flink can help realise this task?
 
Thanks,
Yuri L.

Re: Cep application with Flink

Posted by Maminspapin <un...@mail.ru>.
Hello, *Jörn Franke*. 

Thank you for reply.

If I correctly realise your answer, the watermark Flink mechanism should
help me sort events in order I need. So I should dig deeper in that issue. 

For example, I read three topics, make joins and after get two events by the
same user in this order:

event B (time_event: 12:00) -> event A (time_event: 10:00)

And after watermarkering, the situation is another:

A -> B

Can you confirm I move in right way?

And another moment about communicating with clients. Why need I use nosql
DB? I want to use Flink CEP library to build pattern:

- recived event A, 
- then recived Event B, 
- not recived event C after B for 20 minutes

If pattern is completed by user, so I push record to special Kafka topic.
And another system must make a notification to client. I want to get this
picture.

Or in fact it's no way to get this result using Flink and CEP lib?

Sorry for disturning you. I have no experience with Flink at all. And now I
want to realise how it compatible with that task.

Thanks,
Yuri L.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Cep application with Flink

Posted by Jörn Franke <jo...@gmail.com>.
You are working in a distributed system so event ordering by time may not be sufficient (or most likely not). Due to network delays, devices offline etc it can happen that an event arrives much later although it happened before. Check watermarks in flink and read on at least once, mostly once and exactly once delivery guarantees.

How do you plan to connect the mobile app? If you for example have a notification for the mobile app. Here you have different choices - event Bus architectures (see stomp) or a (nosql) database (see also lambda architecture).

> Am 20.02.2021 um 08:24 schrieb Люльченко Юрий Николаевич <un...@mail.ru>:
> 
> 
> Hi there.
>  
> I’m a newby with Flink and I need to develop a CEP application using this technology. At least, we are choosing a tool now. So I have a question how Flink really fit to my goals.
>  
> Buisness requirements:
>  
> There is a mobile application. Data is collected in Kafka topics (which are have multi partitions) from it. Number of topics is three. And it is not a ready events by user. At first I need to join all data from topics and only after this we have a good clean event of user.
>  
> The order of events by each user is matter.
>  
> So the rules can be like these: User does action A, then B, and then for some period does not action event C. If such a sequence is recived, so the system communicate with this user by different chanels. 
>  
> We don't want to use an extarnal DB, but only Flink states.
>  
> My questions are:
>  
> 1. Using Kafka as input stream of data and collect an event by user.
>  
> I think that the order of clean user events will be wrong with this way, because topics are not partitioned by user key and only one topic has this field. So can I reorder these events by time field of event?
>  
> 2. State of events.
>  
> Can I query the state using SQL syntax? I don't want iterate all records of store to make a communication.
> In case described above (A -> B -> x period waiting -> no C -> communication), the B event stored in state. If C recived the system cleans B in store. We need query the store and get all records B with B.event_time + period_waiting < now_time.
> Or can the CEP library make this job by pattern?
>  
> 3. May be the solve of these requrements are not correct anougth. But again does Flink can help realise this task?
>  
> Thanks,
> Yuri L.