You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Ramanan, Buvana (Nokia - US)" <bu...@nokia-bell-labs.com> on 2016/09/26 18:31:33 UTC

TaskManager & task slots

Hello,

I would like to understand the following better:

https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/config.html#configuring-taskmanager-processing-slots

Fundamental question - what is the notion of Task Slot? Does it correspond to one JVM? Or the Task Manager itself corresponds to one JVM?
Example-1 shows a parallelism of 1 and has 3 operators - flatMap, Reduce & Sink. Here comes the question - are these 3 operators running a separate threads within a JVM?

Sorry for the naïve questions. I studied the following links and could not get a clear answer:
https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/general_arch.html
https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/job_scheduling.html

Are there more documents under Flink's wiki site / elsewhere? Please point me to more info on the architecture.

thank you,
regards,
Buvana


Re: TaskManager & task slots

Posted by "yinhua.dai" <yi...@outlook.com>.
OK, thanks.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: TaskManager & task slots

Posted by Fabian Hueske <fh...@gmail.com>.
Yes, this hasn't changed.

Best, Fabain

Am Mi., 21. Nov. 2018 um 08:18 Uhr schrieb yinhua.dai <
yinhua.2018@outlook.com>:

> Hi Fabian,
>
> Is below description still remain the same in Flink 1.6?
>
> Slots do not guard CPU time, IO, or JVM memory. At the moment they only
> isolate managed memory which is only used for batch processing. For
> streaming applications their only purpose is to limit the number of
> parallel
> threads that can be executed by a TaskManager.
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: TaskManager & task slots

Posted by "yinhua.dai" <yi...@outlook.com>.
Hi Fabian,

Is below description still remain the same in Flink 1.6?

Slots do not guard CPU time, IO, or JVM memory. At the moment they only
isolate managed memory which is only used for batch processing. For
streaming applications their only purpose is to limit the number of parallel
threads that can be executed by a TaskManager.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

RE: TaskManager & task slots

Posted by "Ramanan, Buvana (Nokia - US)" <bu...@nokia-bell-labs.com>.
Thank you Ufuk! 
Once the execution graph is arrived at, is there a way to 'view' the graph to understand operator chaining & slot placement specific to my scenario?

-----Original Message-----
From: Ufuk Celebi [mailto:uce@apache.org] 
Sent: Tuesday, September 27, 2016 8:52 AM
To: user@flink.apache.org
Subject: Re: TaskManager & task slots

On Mon, Sep 26, 2016 at 9:46 PM, Ramanan, Buvana (Nokia - US)
<bu...@nokia-bell-labs.com> wrote:
> When the operators (including source / sink) are chained, what is the method
> of communication between them?

In general, when operators are chained they execute in the same
"physical" task and records are passed directly to the next operator
without any serialization happening.

When they are not chained, records are serialized into Flink's Buffer
instances and the consuming operator fetches them (either locally or
remotely).

Re: TaskManager & task slots

Posted by Ufuk Celebi <uc...@apache.org>.
On Mon, Sep 26, 2016 at 9:46 PM, Ramanan, Buvana (Nokia - US)
<bu...@nokia-bell-labs.com> wrote:
> When the operators (including source / sink) are chained, what is the method
> of communication between them?

In general, when operators are chained they execute in the same
"physical" task and records are passed directly to the next operator
without any serialization happening.

When they are not chained, records are serialized into Flink's Buffer
instances and the consuming operator fetches them (either locally or
remotely).

RE: TaskManager & task slots

Posted by "Ramanan, Buvana (Nokia - US)" <bu...@nokia-bell-labs.com>.
Hello Fabian,

Thanks a lot for the explanation. When the operators (including source / sink) are chained, what is the method of communication between them?

We use Kafka for data source and I was interested to learn the mechanism of communication from Kafka to the next operator, say flatMap… So I studied Flink codebase to some extent and looked into our log file as well.

I see a while(true) loop in Task.java’s run() method (which seems to call the StreamTask’s invoke(), which I suppose will eventually execute the code for the operator) and I also see while (running) in Kafka09Fetcher.java’s run method. When Kafka09Fetcher’s run() receives a record from lower layer, it calls emitRecord() method, which I suppose writes into a ByteBuffer?  how does this ByteBuffer get shared with the next operator (for e.g flatMap)? It’s difficult for me to imagine the two while (running) working at the same time when the fetcher & flatMap are chained together.

If my above understanding is incorrect, please help me understand. You may just point the chain of java method calls…

Sorry for sounding confused. I only have a very little knowledge so far from my study. Would appreciate your explanation in this regard.

Thanks,
Buvana

From: Fabian Hueske [mailto:fhueske@gmail.com]
Sent: Monday, September 26, 2016 2:41 PM
To: user@flink.apache.org
Subject: Re: TaskManager & task slots

Hi Buvana,
A TaskManager runs as a single JVM process. A TaskManager provides a certain number of processing slots. Slots do not guard CPU time, IO, or JVM memory. At the moment they only isolate managed memory which is only used for batch processing. For streaming applications their only purpose is to limit the number of parallel threads that can be executed by a TaskManager.
In each processing slot, a full slice of a program can be executed, i.e., one parallel subtask of each operator of a program. Given a simple program (source -> map -> sink), a slot can process one subtask of the source, the mapper, and the sink (it is possible to split a program to be executed in more slots). Each operator can be executed as a separate thread. However, in many situations, operators are chained within the same thread to improve performance (again it is possible to disallow chaining).
Let me know if you have more questions,
Fabian

2016-09-26 20:31 GMT+02:00 Ramanan, Buvana (Nokia - US) <bu...@nokia-bell-labs.com>>:
Hello,

I would like to understand the following better:

https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/config.html#configuring-taskmanager-processing-slots

Fundamental question – what is the notion of Task Slot? Does it correspond to one JVM? Or the Task Manager itself corresponds to one JVM?
Example-1 shows a parallelism of 1 and has 3 operators – flatMap, Reduce & Sink. Here comes the question – are these 3 operators running a separate threads within a JVM?

Sorry for the naïve questions. I studied the following links and could not get a clear answer:
https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/general_arch.html
https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/job_scheduling.html

Are there more documents under Flink’s wiki site / elsewhere? Please point me to more info on the architecture.

thank you,
regards,
Buvana



Re: TaskManager & task slots

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Buvana,

A TaskManager runs as a single JVM process. A TaskManager provides a
certain number of processing slots. Slots do not guard CPU time, IO, or JVM
memory. At the moment they only isolate managed memory which is only used
for batch processing. For streaming applications their only purpose is to
limit the number of parallel threads that can be executed by a TaskManager.

In each processing slot, a full slice of a program can be executed, i.e.,
one parallel subtask of each operator of a program. Given a simple program
(source -> map -> sink), a slot can process one subtask of the source, the
mapper, and the sink (it is possible to split a program to be executed in
more slots). Each operator can be executed as a separate thread. However,
in many situations, operators are chained within the same thread to improve
performance (again it is possible to disallow chaining).

Let me know if you have more questions,
Fabian

2016-09-26 20:31 GMT+02:00 Ramanan, Buvana (Nokia - US) <
buvana.ramanan@nokia-bell-labs.com>:

> Hello,
>
>
>
> I would like to understand the following better:
>
>
>
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.1/setup/config.html#configuring-taskmanager-processing-slots
>
>
>
> Fundamental question – what is the notion of Task Slot? Does it correspond
> to one JVM? Or the Task Manager itself corresponds to one JVM?
>
> Example-1 shows a parallelism of 1 and has 3 operators – flatMap, Reduce &
> Sink. Here comes the question – are these 3 operators running a separate
> threads within a JVM?
>
>
>
> Sorry for the naïve questions. I studied the following links and could not
> get a clear answer:
>
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.1/internals/general_arch.html
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/job_
> scheduling.html
>
>
>
> Are there more documents under Flink’s wiki site / elsewhere? Please point
> me to more info on the architecture.
>
>
>
> thank you,
>
> regards,
>
> Buvana
>
>
>