You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Nico Scherer <jd...@live.de> on 2015/02/23 22:21:50 UTC

Question about scheduling in 0.7

Hi all,

we’re currently playing/messing around with scheduling in flink 0.7. We found out that if we run a single job with a certain degree of parallelism, multiple tasks/vertices are executed within a single task manager at the same time (or at least before the prior stage is switched to finished). Further, we noticed that only as many flink instances are requested as the DoP is set and different stages are running in the same slot. We’re wondering how this is implemented? Is a single slot using threading to execute multiple tasks at once?


Regards,

Nico

Re: Question about scheduling in 0.7

Posted by Stephan Ewen <se...@apache.org>.

In addition, to what Fabian wrote:

Yes, one slot can run multiple Tasks. In the batch API, one slot can run
concurrently one task of each operation (for example one source, one
mapper, one reducer, one sink).

On Mon, Feb 23, 2015 at 10:38 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Nico,
>
> yes, Flink runs tasks in threads. Each TaskManager runs in its own JVM,
> everything within a TaskManager is parallelized in threads. Since a TM can
> offer multiple slots, also tasks across slots run in the same JVM and in
> different threads.
>
> Flink has a pipelined processing model, which means that multiple
> sequential tasks can run at the same time and that data is pushed from task
> to task in a pipelined fashion. While this was the only communication model
> until now, there are currently efforts to also materialize intermediate
> datasets (apart from full sorts) for more flexible data shipping strategies
> and better fault tolerance.
>
> Let me know if you have further questions about Flink internals.
>
> Best, Fabian
>
> 2015-02-23 22:21 GMT+01:00 Nico Scherer <jd...@live.de>:
>
> > Hi all,
> >
> > we’re currently playing/messing around with scheduling in flink 0.7. We
> > found out that if we run a single job with a certain degree of
> parallelism,
> > multiple tasks/vertices are executed within a single task manager at the
> > same time (or at least before the prior stage is switched to finished).
> > Further, we noticed that only as many flink instances are requested as
> the
> > DoP is set and different stages are running in the same slot. We’re
> > wondering how this is implemented? Is a single slot using threading to
> > execute multiple tasks at once?
> >
> >
> > Regards,
> >
> > Nico
> >
> >
> >
> >
>

Re: Question about scheduling in 0.7

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Nico,

yes, Flink runs tasks in threads. Each TaskManager runs in its own JVM,
everything within a TaskManager is parallelized in threads. Since a TM can
offer multiple slots, also tasks across slots run in the same JVM and in
different threads.

Flink has a pipelined processing model, which means that multiple
sequential tasks can run at the same time and that data is pushed from task
to task in a pipelined fashion. While this was the only communication model
until now, there are currently efforts to also materialize intermediate
datasets (apart from full sorts) for more flexible data shipping strategies
and better fault tolerance.

Let me know if you have further questions about Flink internals.

Best, Fabian

2015-02-23 22:21 GMT+01:00 Nico Scherer <jd...@live.de>:

> Hi all,
>
> we’re currently playing/messing around with scheduling in flink 0.7. We
> found out that if we run a single job with a certain degree of parallelism,
> multiple tasks/vertices are executed within a single task manager at the
> same time (or at least before the prior stage is switched to finished).
> Further, we noticed that only as many flink instances are requested as the
> DoP is set and different stages are running in the same slot. We’re
> wondering how this is implemented? Is a single slot using threading to
> execute multiple tasks at once?
>
>
> Regards,
>
> Nico
>
>
>
>