You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2015/10/07 13:02:27 UTC
[jira] [Commented] (SLIDER-82) Support ANTI_AFFINITY_REQUIRED option

    [ https://issues.apache.org/jira/browse/SLIDER-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946675#comment-14946675 ] 

Steve Loughran commented on SLIDER-82:
--------------------------------------

This is what I'm thinking here

{panel:bgColor=#FFFFCE}
For roles with anti-affinity enabled, we request them one-by one, with a node map for the request which lists every node in the cluster which doesn't have an instance. 

As such requests are independent of requests at other priorities, other roles can come up simultaneously —we don't have to play with blacklists
{panel}

h2. Algorithm
Algorithm starts off very similar to the -002 patch, except now we're using node maps rather than blacklists, and so can work in parallel. I'll try to start with that code.

# for anti-affinity placements, have a flag on the role status to indicate "anti-affinity request in progress", and a counter of pending. These need to be used when reviewing sizes of components
# when reviewing such a component, requests for placed instances are made first, and pushed out in parallel.
# next, the current node map is requested
# Exactly one container request is made, asking for anywhere on the entire set of nodes for which there is neither a live container, nor a placed request. Relax=false. This is the _anti-affinity request_
# If placed requests are satisfied, launch containers there.
# If the outstanding anti-affinity request is satisfied, launch a container on that node, then issue a new anti-affinity request with that node also excluded from the set of possible nodes.
# if placed requests cannot be satisfied, the escalation policy will have to become "just another anti-affinity request". Drop the requested node from the list of historical placements, so that it can be included in future anti-affinity requests (but not any outstanding one, which will be left alone).

With this algorithm, once there's a placement history, requests for instances on those historical nodes will be issued in parallel, and satisfied if the nodes have capacity. It's only instances for which there is no placement history (including escalated ones), where the time to build up the components is {{O(unplaced-components)}}. As example, bringing up 3 kafka nodes -if we remember where they were, we just ask for them back there again and wait. If there's not any/enough history, then the one-by-one anti-affinity request process happens for all the ones we don't have a place for. For escalation, the same process will apply —but only after the escalation timeout for that component kicks in.

h2. Fun little details
# to handle affinity + placement history, we'll have to run through all nodes with a historical placement first, then escalate them to anti-affinity requests.
# the current model of counting nodes by: live + outstanding + delete-in-progress will need to be extended to add a queued-for-affinity field; when flexing down a cluster, these will be dropped even before cancelling outstanding requests.
# What if new nodes join a cluster while a request is outstanding? Plan: don't even check for this; successor requests will refresh the node map.
# What to do when anti-affinity placement requests cannot be satisfied? Plan: leave them outstanding.  This introduces a specific situation: an application cannot become quorate with anti-affinity set, even if there's capacity in the cluster. This is something that could be added to monitoring later -such as allowing the app to fail if it cannot go quorate in a specified time. (of course, that adds a follow-in problem: what if, after failures, the min number of nodes cannot be satisfied.
# what if, while there is a pending anti-affinity request, one of the live cluster nodes fails? Really, that should trigger a re-request on same node + escalation plan. Probably best done by cancelling the outstanding request and then restarting the review & request process, so the placed request goes out first.
# cluster restart. Outstanding requests get lost, so rebuild state and carry on from there, issuing new placed and anti-affine requests.
# what if you want anti-affine but not placement history? Skip the placed phase. This may actually prove slower, as you can't issue requests in parallel. But: probability of request satisfaction is higher; there'll be no timeouts or escalation process.
# what if you only want one instance? This is covered implicitly: just issue a single anti-affine request (i.e. we don't special case it). This won't hurt and it ensures a single code path for testing.

h2. Testing

we could do some minimal stuff with the mock RM. This doesn't test YARN. Minicluster doesn't work, as there's only one host: localhost. 

That leaves integration tests with clusters of sizes > 1. We can do this, we'll just need to skip those tests if cluster size < 2. 

Possible tests:
- request 2 anti-affine containers with no history; expect instances on both nodes
- request 2 anti-affine containers with a history: expect instances on both nodes. We may want to add some way to get the placement history so we can determine whether the requests were satisfied through history vs  anti-affine.
- request 2 anti-affine containers with a history for one. expect instances on both nodes
- request 1 anti-affine container with a history for a node that lacks capacity/is offline. Expect: escalation to another node.
- request more anti-affine containers than the cluster has nodes. Expect: unsatisfied requests.
-AM restart while unsatisfied request is outstanding. Expect: same state is eventually reached.
-kill container while outstanding request. Expect: request re-issued for that container, then new unsatisfiable request issued

> Support ANTI_AFFINITY_REQUIRED option
> -------------------------------------
>
>                 Key: SLIDER-82
>                 URL: https://issues.apache.org/jira/browse/SLIDER-82
>             Project: Slider
>          Issue Type: Task
>          Components: appmaster
>            Reporter: Steve Loughran
>             Fix For: Slider 2.0.0
>
>         Attachments: SLIDER-82.002.patch
>
>
> slider has an anti-affinity flag in roles (visible in resources.json?), which is ignored.
> YARN-1042 promises this for YARN, slider will need
> # flag in resources.json
> # use in container requests
> we may also want two policies: anti-affinity-desired, and -required. Then if required nodes get >1 container for the same component type on the same node, it'd have to request a new one and return the old one (Risk: getting the same one back). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)