You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Philip Lee <ph...@gmail.com> on 2016/02/15 23:17:48 UTC

Hello, Pipelining Question

Hi,

I found some interesting results from comparison with spark-sql and flink.
just for your information, spark-sql uses Hive QL on spark machine.


so as far as we know, when we run Flink job, the functions could be
overlapped on *pipelining* like this picture.

[image: Inline image 1]

likewise, spark supports *pipelining* as I read PPT of Spark. The function
could be overlapped as well. but it seems like there is some boundary.

For example, in *Flink*, functions to read multiple inputs could be run
together *with join function* like the above pic. but in *Spark*, to read
multiple inputs can be together, but join function is seemingly *sepearted*
to the reading functions. (you can see the starting time and duration,
indicating join step is seperated)

[image: Inline image 2]

This is why Spark is a Batch processing in memory, wherease Flink is a
Streaming processing in memory?

Best,
Phil

Re: Hello, Pipelining Question

Posted by Fabian Hueske <fh...@gmail.com>.

Yes, Flink is a pipelined system because it is able to shipped data over
the network while it is produced (pipelined network communication).
In constrast, Spark produces a result completely before it it is sent over
the network in a batch fashion.

However, Flink does also support batched data exchange similar to Spark.

Best, Fabian

2016-02-15 23:17 GMT+01:00 Philip Lee <ph...@gmail.com>:

> Hi,
>
> I found some interesting results from comparison with spark-sql and flink.
> just for your information, spark-sql uses Hive QL on spark machine.
>
>
> so as far as we know, when we run Flink job, the functions could be
> overlapped on *pipelining* like this picture.
>
> [image: Inline image 1]
>
> likewise, spark supports *pipelining* as I read PPT of Spark. The
> function could be overlapped as well. but it seems like there is some
> boundary.
>
> For example, in *Flink*, functions to read multiple inputs could be run
> together *with join function* like the above pic. but in *Spark*, to read
> multiple inputs can be together, but join function is seemingly
> *sepearted* to the reading functions. (you can see the starting time and
> duration, indicating join step is seperated)
>
> [image: Inline image 2]
>
> This is why Spark is a Batch processing in memory, wherease Flink is a
> Streaming processing in memory?
>
> Best,
> Phil
>
>
>
>