You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Philip Lee <ph...@gmail.com> on 2016/02/15 23:17:48 UTC
Hello, Pipelining Question
Hi,
I found some interesting results from comparison with spark-sql and flink.
just for your information, spark-sql uses Hive QL on spark machine.
so as far as we know, when we run Flink job, the functions could be
overlapped on *pipelining* like this picture.
[image: Inline image 1]
likewise, spark supports *pipelining* as I read PPT of Spark. The function
could be overlapped as well. but it seems like there is some boundary.
For example, in *Flink*, functions to read multiple inputs could be run
together *with join function* like the above pic. but in *Spark*, to read
multiple inputs can be together, but join function is seemingly *sepearted*
to the reading functions. (you can see the starting time and duration,
indicating join step is seperated)
[image: Inline image 2]
This is why Spark is a Batch processing in memory, wherease Flink is a
Streaming processing in memory?
Best,
Phil
Re: Hello, Pipelining Question
Posted by Fabian Hueske <fh...@gmail.com>.
Yes, Flink is a pipelined system because it is able to shipped data over
the network while it is produced (pipelined network communication).
In constrast, Spark produces a result completely before it it is sent over
the network in a batch fashion.
However, Flink does also support batched data exchange similar to Spark.
Best, Fabian
2016-02-15 23:17 GMT+01:00 Philip Lee <ph...@gmail.com>:
> Hi,
>
> I found some interesting results from comparison with spark-sql and flink.
> just for your information, spark-sql uses Hive QL on spark machine.
>
>
> so as far as we know, when we run Flink job, the functions could be
> overlapped on *pipelining* like this picture.
>
> [image: Inline image 1]
>
> likewise, spark supports *pipelining* as I read PPT of Spark. The
> function could be overlapped as well. but it seems like there is some
> boundary.
>
> For example, in *Flink*, functions to read multiple inputs could be run
> together *with join function* like the above pic. but in *Spark*, to read
> multiple inputs can be together, but join function is seemingly
> *sepearted* to the reading functions. (you can see the starting time and
> duration, indicating join step is seperated)
>
> [image: Inline image 2]
>
> This is why Spark is a Batch processing in memory, wherease Flink is a
> Streaming processing in memory?
>
> Best,
> Phil
>
>
>
>