You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Ben La Monica (JIRA)" <ji...@apache.org> on 2019/05/21 17:30:00 UTC

[jira] [Commented] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers

    [ https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845062#comment-16845062 ] 

Ben La Monica commented on FLINK-12122:
---------------------------------------

I'm running into this exact problem, I have a CoProcessFunction that contains a large amount of state, and they are spread primarily on only 2 of my 6 task managers. This causes memory issues on those boxes and then there is 60GB of ram on the third box unused.
||TaskManager||Num Slots Used for Memory Intensive Tasks||
|ip-10-255-58-174:39389|17|
|ip-10-255-58-174:45161|8|
|ip-10-255-58-179:33657|1|
|ip-10-255-58-179:38439|0|
|ip-10-255-58-44:40181|6|
|ip-10-255-58-44:45435|18|

And then I end up with resource usage in my YARN cluster that looks like this:

!image-2019-05-21-12-28-29-538.png!

Is there an estimate on when this problem will be fixed? I'm pretty much blocked unless I move to much larger servers and that is wasteful of money :).

> Spread out tasks evenly across all available registered TaskManagers
> --------------------------------------------------------------------
>
>                 Key: FLINK-12122
>                 URL: https://issues.apache.org/jira/browse/FLINK-12122
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.6.4, 1.7.2, 1.8.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>             Fix For: 1.7.3, 1.9.0, 1.8.1
>
>         Attachments: image-2019-05-21-12-28-29-538.png
>
>
> With Flip-6, we changed the default behaviour how slots are assigned to {{TaskManages}}. Instead of evenly spreading it out over all registered {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a tendency to first fill up a TM before using another one. This is a regression wrt the pre Flip-6 code.
> I suggest to change the behaviour so that we try to evenly distribute slots across all available {{TaskManagers}} by considering how many of their slots are already allocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)