You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by AndreaKinn <ki...@hotmail.it> on 2017/09/18 13:22:14 UTC

Load distribution through the cluster

Hi, 
I'm experimenting a bit with the cluster. 
I didn't set any options about sharing slots and chains hoping that Flink
decided autonomously how to balance the load through the nodes of the
cluster. My cluster is composed by one job and task manager and two task
manager.

I noted that every time I start the program, just one node is busy (at > 95%
for each cpu core) while the other nodes are completely free (< 3%). Same
arguments for the memory.

So Flink doesn't balance the work on the nodes??
I expected something like the cpu usage was distributed on every nodes.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Load distribution through the cluster

Posted by Chesnay Schepler <ch...@apache.org>.

It should only apply to the map operator.

On 19.09.2017 17:38, AndreaKinn wrote:
> If I apply a sharing slot as in the example:
>
> DataStream<Event> LTzAccStream = env
> 				.addSource(new FlinkKafkaConsumer010<>("topic", new
> CustomDeserializer(), properties))
> 				.assignTimestampsAndWatermarks(new CustomTimestampExtractor())
> 				.map(new MapFunction<Tuple2&lt;String, String>, Event>(){
>                                        @Override
> 					public Event map(Tuple2<String, String> value) throws Exception {
> 						return new Event(value.f0, value.f1);
> 					}
> 				}).slotSharingGroup("group1");
>
> just the map operator is assigned to the shared slot or it happens for the
> entire block (addSource + assignTimestamp + map)?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Load distribution through the cluster

Posted by AndreaKinn <ki...@hotmail.it>.

If I apply a sharing slot as in the example:

DataStream<Event> LTzAccStream = env
				.addSource(new FlinkKafkaConsumer010<>("topic", new
CustomDeserializer(), properties))
				.assignTimestampsAndWatermarks(new CustomTimestampExtractor())
				.map(new MapFunction<Tuple2&lt;String, String>, Event>(){ 
                                      @Override 
					public Event map(Tuple2<String, String> value) throws Exception { 
						return new Event(value.f0, value.f1); 
					} 
				}).slotSharingGroup("group1");

just the map operator is assigned to the shared slot or it happens for the
entire block (addSource + assignTimestamp + map)?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Load distribution through the cluster

Posted by Fabian Hueske <fh...@gmail.com>.

There is no notion of "full" in Flink except that one slot will run at most
one subtask of each operator.

The scheduling depends on the structure of the job, the parallelism of the
operators, and the number of slots per TM.
It's hard to tell without knowing the details.

2017-09-19 11:57 GMT+02:00 AndreaKinn <ki...@hotmail.it>:

> So Flink use the other nodes just if one is completely "full" ?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Re: Load distribution through the cluster

Posted by AndreaKinn <ki...@hotmail.it>.

So Flink use the other nodes just if one is completely "full" ?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Load distribution through the cluster

Posted by Fabian Hueske <fh...@gmail.com>.

Hi,

Flink's scheduling aims to co-located tasks to reduce network communication
and ease the reasoning about resource/slot consumption.
A slot can execute one subtask of each operator of a program, i.e, a
parallel slice of the program.

You can control the scheduling of tasks by specifying resource groups. [1]
[2]

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/concepts/runtime.html#task-slots-and-resources
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups

2017-09-18 15:22 GMT+02:00 AndreaKinn <ki...@hotmail.it>:

> Hi,
> I'm experimenting a bit with the cluster.
> I didn't set any options about sharing slots and chains hoping that Flink
> decided autonomously how to balance the load through the nodes of the
> cluster. My cluster is composed by one job and task manager and two task
> manager.
>
> I noted that every time I start the program, just one node is busy (at >
> 95%
> for each cpu core) while the other nodes are completely free (< 3%). Same
> arguments for the memory.
>
> So Flink doesn't balance the work on the nodes??
> I expected something like the cpu usage was distributed on every nodes.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>