You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "ashok34668@yahoo.com.INVALID" <as...@yahoo.com.INVALID> on 2022/01/14 19:31:41 UTC

Spark with parallel processing and event driven architecture

Hi gurus,
I am trying to understand the role of Spark in an event driven architecture. I know Spark deals with massive parallel processing. However, does Spark follow event driven architecture like Kafka as well? Say handling producers, filtering and pushing the events to consumers like database etc.
thanking you

Re: Spark with parallel processing and event driven architecture

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hm, perhaps some intro will help.

An event is a change in state, or an update of some key business system. An
event-driven architecture (EDA) consists of an event producer, an event
consumer, and an event broker. If we follow this line of thought, an event
is a primitive construct that travels from the producer to the consumer via
an event broker. It carries a state change that happened in the producer.
For example, new trade prices were placed or the temperature reading at a
certain time was  21 degrees Celsius. A sequence of related events is
commonly called a stream, which brings up things like Spark Structured
Streaming etc.

EDA are ideal for flexibility and moving quickly. They are commonly found
in modern applications that use microservices (Microservices is an
architectural
style that structures an application as a collection of services that are
highly scalable, loosely coupled, independently deployable and organised
around business capabilities). They get their name because each function of
the application operates as an independent service or microservice.  A
microservice
is a software design pattern which runs within a container.

Back to Spark in EDA, Spark relies on Kafka, Google Pub/Sub or other
messaging systems  to process the related streaming data via topic or
topics and present them to Spark. At this stage Spark does not care to know
how this streaming data is produced. Spark relies on the appropriate API to
read data from Kafka
<https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html>
or
from Google Pub/Sub
<https://cloud.google.com/pubsub/lite/docs/spark#console_1>. For example if
you are processing temperature, you construct a streaming dataframe that
subscribes to a topic say temperature. As long as you have the correct jar
files to  interface with Kafka, this should work. In reality the Kafka will
be running on its own container(s) plus the zookeeper containers. (A
container is a useful resource allocation and sharing technology all about
packaging software for deployment).Your Streaming DataFrame below

 sreamingDataFrame = self.spark \

                .readStream \

                .format("kafka") \

                .option("kafka.bootstrap.servers",
config['MDVariables']['bootstrapServers'],) \
.....

All you need to know is the values for *Kafka .bootstrap.serve*rs that are
read from the yaml file at the start and you can easily change them without
code change. Someone else can look after the Kafka part and most likely
Kafka could be running on its own Kubernetes (k8s) cluster.

Come to Spark, you can also deploy Spark on k8s for scalability and
availability. At this stage, you can build a complete decision engine on
Spark Structured Streaming running on k8s. You can work out average
temperatures over a period and write it to a time series database via Spark
API (to such a database0, or you can decide to publish it to another kafka
topic. In short, you can see this architecture is pretty agile so to speak
with loose deployment of microservices.

Some of these terms like Kubernetes, containers, microservices, pod,
docker, docker image etc, I have tried to explain here
<https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-/>

HTH

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Fri, 14 Jan 2022 at 19:33, ashok34668@yahoo.com.INVALID
<as...@yahoo.com.invalid> wrote:

> Hi gurus,
>
> I am trying to understand the role of Spark in an event driven
> architecture. I know Spark deals with massive parallel processing. However,
> does Spark follow event driven architecture like Kafka as well? Say
> handling producers, filtering and pushing the events to consumers like
> database etc.
>
> thanking you
>