You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Boyang Jerry Peng (JIRA)" <ji...@apache.org> on 2015/11/02 18:10:28 UTC

[jira] [Comment Edited] (STORM-898) Add priorities and per user resource guarantees to Resource Aware Scheduler

    [ https://issues.apache.org/jira/browse/STORM-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985547#comment-14985547 ] 

Boyang Jerry Peng edited comment on STORM-898 at 11/2/15 5:09 PM:
------------------------------------------------------------------

[~revans2] thanks for your comments! Minimizing fragmentation in the cluster is a hard problem and perhaps not in the scope of this jira.  To deal effectively deal with fragmentation, the executors in the cluster will need to periodically be re-scheduled based on how much churn (starting and killing topologies) there is in the cluster.  There are a number of papers in academia that propose algorithms/mechanisms to deal with this.  However, for now lets just ignore fragmentation and only judge whether a topology can be scheduled by actually trying to schedule the topology. If the topology cannot be scheduled it can mean the cluster is full or there is too much fragmentation. More specifically, if cluster available resources > topology required and the scheduler cannot find a valid scheduling.  Efficiently and effectively dealing with fragmentation is not a trivial problem, thus I propose we file a follow up jira to deal with fragmentation.  I hope this answers your question.


was (Author: jerrypeng):
[~revans2] thanks for your comments! Minimizing fragmentation in the cluster is a hard problem and perhaps not in the scope of this jira.  To deal effectively deal with fragmentation, the executors in the cluster will need to periodically re-scheduled based on how much churn (starting and killing topologies) there is in the cluster.  There are a number of papers in academia that propose algorithms/mechanisms to deal with this.  However, for now lets just ignore fragmentation and only judge whether a topology can be scheduled by actually trying to schedule the topology. If the topology cannot be scheduled it can mean the cluster is full or there is too much fragmentation. More specifically, if cluster available resources > topology required and the scheduler cannot find a valid scheduling.  Efficiently and effectively dealing with fragmentation is not a trivial problem, thus I propose we file a follow up jira to deal with fragmentation.  I hope this answers your question.

> Add priorities and per user resource guarantees to Resource Aware Scheduler
> ---------------------------------------------------------------------------
>
>                 Key: STORM-898
>                 URL: https://issues.apache.org/jira/browse/STORM-898
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: Robert Joseph Evans
>            Assignee: Boyang Jerry Peng
>         Attachments: Resource Aware Scheduler for Storm.pdf
>
>
> In a multi-tenant environment we would like to be able to give individual users a guarantee of how much CPU/Memory/Network they will be able to use in a cluster.  We would also like to know which topologies a user feels are the most important to keep running if there are not enough resources to run all of their topologies.
> Each user should be able to specify if their topology is production, staging, or development. Within each of those categories a user should be able to give a topology a priority, 0 to 10 with 10 being the highest priority (or something like this).
> If there are not enough resources on a cluster to run a topology assume this topology is running using resources and find the user that is most over their guaranteed resources.  Shoot the lowest priority topology for that user, and repeat until, this topology is able to run, or this topology would be the one shot.   Ideally we don't actually shoot anything until we know that we would have made enough room.
> If the cluster is over-subscribed and everyone is under their guarantee, and this topology would not put the user over their guarantee.  Shoot the lowest priority topology in this workers resource pool until there is enough room to run the topology or this topology is the one that would be shot.  We might also want to think about what to do if we are going to shoot a production topology in an oversubscribed case, and perhaps we can shoot a non-production topology instead even if the other user is not over their guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)