You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Esa Heikkinen <es...@student.tut.fi> on 2018/05/18 07:20:54 UTC

How to Spark can solve this example

Hi

I have attached fictive example (pdf-file) about processing of event traces from data streams (or batch data). I hope the picture of the attachment is clear and understandable.

I would be very interested in how best to solve it with Spark. Or it is possible or not ? If it is possible, can it be solved for example by CEP ?

Little explanations.. Data processing reads three different and parallel streams (or batch data): A, B and C. Each of them have events which have different "keys with value" (like K1-K4) or record.

I would want to find all event traces, which have certain dependences or patterns between streams (or batches). To find pattern there are three steps:

1)      Searches an event that have value "X" in K1 in stream A and if it is found, stores it to global data for later use and continues next step

2)      Searches an event that have value A(K1) in K2 in stream B and if it is found, stores it to global data for later use and continues next step

3)      Searches an event that have value A(K1) in K1 and value B(K3) in K2 in stream C and if it is found, continues next step (back to step 1)

If that is not possible by Spark, do you have any idea of tools, which can solve this ?

Best, Esa


RE: How to Spark can solve this example

Posted by "정유선님 (JUNG YOUSUN)" <je...@sk.com>.
How about Structured Streaming with Kafka? It is possible to operate through window time. For more information, see here https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html

Sincerely,
Yousun Jeong

From: Matteo Cossu <el...@gmail.com>
Sent: Friday, May 18, 2018 4:51 PM
To: Esa Heikkinen <es...@student.tut.fi>
Cc: user@spark.apache.org
Subject: Re: How to Spark can solve this example

Hello Esa,
all the steps that you described can be performed with Spark. I don't know about CEP, but Spark Streaming should be enough.

Best,

Matteo

On 18 May 2018 at 09:20, Esa Heikkinen <es...@student.tut.fi>> wrote:
Hi

I have attached fictive example (pdf-file) about processing of event traces from data streams (or batch data). I hope the picture of the attachment is clear and understandable.

I would be very interested in how best to solve it with Spark. Or it is possible or not ? If it is possible, can it be solved for example by CEP ?

Little explanations.. Data processing reads three different and parallel streams (or batch data): A, B and C. Each of them have events which have different “keys with value” (like K1-K4) or record.

I would want to find all event traces, which have certain dependences or patterns between streams (or batches). To find pattern there are three steps:

1)      Searches an event that have value “X” in K1 in stream A and if it is found, stores it to global data for later use and continues next step

2)      Searches an event that have value A(K1) in K2 in stream B and if it is found, stores it to global data for later use and continues next step

3)      Searches an event that have value A(K1) in K1 and value B(K3) in K2 in stream C and if it is found, continues next step (back to step 1)

If that is not possible by Spark, do you have any idea of tools, which can solve this ?

Best, Esa



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>


RE: How to Spark can solve this example

Posted by Esa Heikkinen <es...@student.tut.fi>.
Hello

That is good to hear, but are there exist some good practical (Python or Scala) examples ? This would help a lot.

I tried to do that by Apache Flink (and its CEP) and it was not so piece cake.

Best, Esa

From: Matteo Cossu <el...@gmail.com>
Sent: Friday, May 18, 2018 10:51 AM
To: Esa Heikkinen <es...@student.tut.fi>
Cc: user@spark.apache.org
Subject: Re: How to Spark can solve this example

Hello Esa,
all the steps that you described can be performed with Spark. I don't know about CEP, but Spark Streaming should be enough.

Best,

Matteo

On 18 May 2018 at 09:20, Esa Heikkinen <es...@student.tut.fi>> wrote:
Hi

I have attached fictive example (pdf-file) about processing of event traces from data streams (or batch data). I hope the picture of the attachment is clear and understandable.

I would be very interested in how best to solve it with Spark. Or it is possible or not ? If it is possible, can it be solved for example by CEP ?

Little explanations.. Data processing reads three different and parallel streams (or batch data): A, B and C. Each of them have events which have different “keys with value” (like K1-K4) or record.

I would want to find all event traces, which have certain dependences or patterns between streams (or batches). To find pattern there are three steps:

1)      Searches an event that have value “X” in K1 in stream A and if it is found, stores it to global data for later use and continues next step

2)      Searches an event that have value A(K1) in K2 in stream B and if it is found, stores it to global data for later use and continues next step

3)      Searches an event that have value A(K1) in K1 and value B(K3) in K2 in stream C and if it is found, continues next step (back to step 1)

If that is not possible by Spark, do you have any idea of tools, which can solve this ?

Best, Esa



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>


Re: How to Spark can solve this example

Posted by Matteo Cossu <el...@gmail.com>.
Hello Esa,
all the steps that you described can be performed with Spark. I don't know
about CEP, but Spark Streaming should be enough.

Best,

Matteo

On 18 May 2018 at 09:20, Esa Heikkinen <es...@student.tut.fi> wrote:

> Hi
>
>
>
> I have attached fictive example (pdf-file) about processing of event
> traces from data streams (or batch data). I hope the picture of the
> attachment is clear and understandable.
>
>
>
> I would be very interested in how best to solve it with Spark. Or it is
> possible or not ? If it is possible, can it be solved for example by CEP ?
>
>
>
> Little explanations.. Data processing reads three different and parallel
> streams (or batch data): A, B and C. Each of them have events which have
> different “keys with value” (like K1-K4) or record.
>
>
>
> I would want to find all event traces, which have certain dependences or
> patterns between streams (or batches). To find pattern there are three
> steps:
>
> 1)      Searches an event that have value “X” in K1 in stream A and if it
> is found, stores it to global data for later use and continues next step
>
> 2)      Searches an event that have value A(K1) in K2 in stream B and if
> it is found, stores it to global data for later use and continues next step
>
> 3)      Searches an event that have value A(K1) in K1 and value B(K3) in
> K2 in stream C and if it is found, continues next step (back to step 1)
>
>
>
> If that is not possible by Spark, do you have any idea of tools, which can
> solve this ?
>
>
>
> Best, Esa
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>