You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "shuai.xu (JIRA)" <ji...@apache.org> on 2017/10/19 05:21:01 UTC

[jira] [Created] (FLINK-7870) SlotPool should cancel the slot request to RM if not need any more.

shuai.xu created FLINK-7870:
-------------------------------

             Summary: SlotPool should cancel the slot request to RM if not need any more.
                 Key: FLINK-7870
                 URL: https://issues.apache.org/jira/browse/FLINK-7870
             Project: Flink
          Issue Type: Bug
          Components: Cluster Management
            Reporter: shuai.xu
            Assignee: shuai.xu


1. SlotPool will request slot to rm if its slots are not enough.
2. If a slot request is not fulfilled in a certain time, SlotPool will treat the request as timeout and send a new slot request by triggering a failover in JobMaster, the previous request is not needed any more, but rm does not know it.
3. This may cause the rm request much more resource than the job really need.
For example:
1. A job need 100 slots. RM request 100 container to YARN.
2. But YARN is busy now, it has no resource for the job.
3. The job failover as the resource request not fulfilled in time.
4. It ask 100 slots again, now RM request 200 container to YARN.
5. If failover server time, the containers request  will become more and more.
6. Now YARN has resource, it will find that the job may need thousands of containers. This is a waste of resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)