You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Esa Heikkinen <es...@student.tut.fi> on 2018/05/17 13:53:55 UTC

How to Flink can solve this example

Hi

I have attached fictive example (pdf-file) about processing of event traces from data streams (or batch data). I hope the picture of the attachment is clear and understandable.

I would be very interested in how best to solve it with Flink. Or it is possible or not ? If it is possible, can it be solved for example by CEP, Table/SQL or Gelly?

Little explanations.. Data processing reads three different and parallel streams (or batch data): A, B and C. Each of them have events which have different "keys with value" (like K1-K4) or record.

I would want to find all event traces, which have certain dependences or patterns between streams (or batches). To find pattern there are three steps:

1)      Searches an event that have value "X" in K1 in stream A and if it is found, stores it to global data for later use and continues next step

2)      Searches an event that have value A(K1) in K2 in stream B and if it is found, stores it to global data for later use and continues next step

3)      Searches an event that have value A(K1) in K1 and value B(K3) in K2 in stream C and if it is found, continues next step (back to step 1)

If that is not possible by Flink, do you have any idea of tools, which can solve this ?

Best, Esa


RE: How to Flink can solve this example

Posted by Esa Heikkinen <es...@student.tut.fi>.
Hi

I have tried this "homework" about 2 months now.. I have tried for example CEP (with Scala) and iterative condition, but I don't really understand how it would work. Or does it work in my case ?

It seems difficult to store values of previous state to use next state(s). And even between different parallel "streams" and variables.

Now I am trying other solutions Table API /SQL, but it does not look very clear either.

But tips would be very valuable. It takes a surprising amount of time to learn Flink (in special cases). Maybe it is easier ways exist. I think the anatomy [1] of Flink program can restrict this, because many "initializations" and then executes all later in one "source(s)-transform-sink" process. Maybe I would need parallel transforms controlled by some upper level state-machine ?

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/api_concepts.html

Best, Esa

From: Georgi Stoyanov <gs...@live.com>
Sent: Friday, May 18, 2018 7:54 PM
To: Esa Heikkinen <es...@student.tut.fi>
Cc: user@flink.apache.org
Subject: RE: How to Flink can solve this example

Hi Esa :)

I don't think that there are people here that want to make your homework.
Think about, read flink's documentation and ask concrete questions.
Anyway - I took a look and think that it's possible.

Regards,
Georgi

From: Esa Heikkinen [mailto:esa.heikkinen@student.tut.fi]
Sent: Thursday, May 17, 2018 4:54 PM
To: user@flink.apache.org<ma...@flink.apache.org>
Subject: How to Flink can solve this example

Hi

I have attached fictive example (pdf-file) about processing of event traces from data streams (or batch data). I hope the picture of the attachment is clear and understandable.

I would be very interested in how best to solve it with Flink. Or it is possible or not ? If it is possible, can it be solved for example by CEP, Table/SQL or Gelly?

Little explanations.. Data processing reads three different and parallel streams (or batch data): A, B and C. Each of them have events which have different "keys with value" (like K1-K4) or record.

I would want to find all event traces, which have certain dependences or patterns between streams (or batches). To find pattern there are three steps:

1)      Searches an event that have value "X" in K1 in stream A and if it is found, stores it to global data for later use and continues next step

2)      Searches an event that have value A(K1) in K2 in stream B and if it is found, stores it to global data for later use and continues next step

3)      Searches an event that have value A(K1) in K1 and value B(K3) in K2 in stream C and if it is found, continues next step (back to step 1)

If that is not possible by Flink, do you have any idea of tools, which can solve this ?

Best, Esa


RE: How to Flink can solve this example

Posted by Georgi Stoyanov <gs...@live.com>.
Hi Esa :)

I don't think that there are people here that want to make your homework.
Think about, read flink's documentation and ask concrete questions.
Anyway - I took a look and think that it's possible.

Regards,
Georgi

From: Esa Heikkinen [mailto:esa.heikkinen@student.tut.fi]
Sent: Thursday, May 17, 2018 4:54 PM
To: user@flink.apache.org
Subject: How to Flink can solve this example

Hi

I have attached fictive example (pdf-file) about processing of event traces from data streams (or batch data). I hope the picture of the attachment is clear and understandable.

I would be very interested in how best to solve it with Flink. Or it is possible or not ? If it is possible, can it be solved for example by CEP, Table/SQL or Gelly?

Little explanations.. Data processing reads three different and parallel streams (or batch data): A, B and C. Each of them have events which have different "keys with value" (like K1-K4) or record.

I would want to find all event traces, which have certain dependences or patterns between streams (or batches). To find pattern there are three steps:

1)      Searches an event that have value "X" in K1 in stream A and if it is found, stores it to global data for later use and continues next step

2)      Searches an event that have value A(K1) in K2 in stream B and if it is found, stores it to global data for later use and continues next step

3)      Searches an event that have value A(K1) in K1 and value B(K3) in K2 in stream C and if it is found, continues next step (back to step 1)

If that is not possible by Flink, do you have any idea of tools, which can solve this ?

Best, Esa