You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by Steve Loughran <st...@hortonworks.com> on 2015/11/23 18:39:45 UTC

Anti-affinity placement in develop branch!


Hi

I've just merged and checked the core of the AA placement work into develop

Features

  1.  Guaranteed anti-affinity placement: if there aren't enough live hosts for all the requested containers, you don't get all the containers. You will not get multiple instances on the same node.
  2.  incremental start of containers
  3.  web UI gives some information about what is going on (and why the application isn't complete). See SLIDER-979 for screen shots.

All you have to do for this is to set the yarn.component.placement.policy=4

"yarn.component.placement.policy": "4"

{
  "schema" : "http://example.org/specification/v2.0.0",
  "metadata" : {
  },
  "global" : {
  },
  "components": {
    "slider-appmaster": {
      "yarn.memory": "256"
    },
    "SLEEP_100": {
      "yarn.role.priority": "1",
      "yarn.component.instances": "1",
      "yarn.memory": "128"
    },
    "SLEEP_LONG": {
      "yarn.role.priority": "2",
      "yarn.component.instances": "4",
      "yarn.memory": "128",
      "yarn.component.placement.policy": "4"   <-- HERE
    }
  }
}


That tells slider that the "SLEEP_LONG" instances must come up on separate nodes.

The code is in, it needs more testing —please, set this placement policy on your components and see what happens. If any come up on the same machine: bug. If you don't get all of them up, even though there is enough spare machines: bug.

On that topic, the "slider nodes" command gives you a list of all nodes in the cluster;

slider nodes  [<instance>] [--healthy] [--label <label>]

invoke slider nodes without a cluster name, you get a JSON summary from YARN itself. Invoke it with a cluster name and you get the view of the world from the AM: what nodes are there, it's view of their health, what components are on each node and historical data.

If you do find a bug in the AA placement code, grabbing that JSON file for the cluster in question would be really helpful in understanding what's up.

Not in this release

There's no use of historical information when bringing up an AA cluster. Its planned, I just wanted to get the core placement working.

Likely troublespots

Here is what I haven't tested fully —which will need more field trials.

  1.  How frequent/accurate are node updates coming from the RM to the Slider AM? We're relying on those to know when new hosts are added, or when existing hosts that were unavailable come online. Without notifications, requests for placements won't include those hostnames
  2.  Scale surprises: does asking for every host but those in use hit limits?
  3.  Handling unreliable nodes. There's no blacklisting here —or exclusion of untrusted nodes from the explicit list requested. Does this create a bias towards rescheduling work on unreliable servers?
  4.  Startup time. How long does it take? I'm assuming, at a minimum 10s per desired component instance, even when there is cluster capacity. But you should not see delays if you are asking for AA placements of different component types; they will all be issued in parallel.

Please download and play with it: I'm not doing any more to it this week.

-Steve





Re: Anti-affinity placement in develop branch!

Posted by Josh Elser <jo...@gmail.com>.
Yay, great work, Steve!

Steve Loughran wrote:
>
> Hi
>
> I've just merged and checked the core of the AA placement work into develop
>
> Features
>
>    1.  Guaranteed anti-affinity placement: if there aren't enough live hosts for all the requested containers, you don't get all the containers. You will not get multiple instances on the same node.
>    2.  incremental start of containers
>    3.  web UI gives some information about what is going on (and why the application isn't complete). See SLIDER-979 for screen shots.
>
> All you have to do for this is to set the yarn.component.placement.policy=4
>
> "yarn.component.placement.policy": "4"
>
> {
>    "schema" : "http://example.org/specification/v2.0.0",
>    "metadata" : {
>    },
>    "global" : {
>    },
>    "components": {
>      "slider-appmaster": {
>        "yarn.memory": "256"
>      },
>      "SLEEP_100": {
>        "yarn.role.priority": "1",
>        "yarn.component.instances": "1",
>        "yarn.memory": "128"
>      },
>      "SLEEP_LONG": {
>        "yarn.role.priority": "2",
>        "yarn.component.instances": "4",
>        "yarn.memory": "128",
>        "yarn.component.placement.policy": "4"<-- HERE
>      }
>    }
> }
>
>
> That tells slider that the "SLEEP_LONG" instances must come up on separate nodes.
>
> The code is in, it needs more testing —please, set this placement policy on your components and see what happens. If any come up on the same machine: bug. If you don't get all of them up, even though there is enough spare machines: bug.
>
> On that topic, the "slider nodes" command gives you a list of all nodes in the cluster;
>
> slider nodes  [<instance>] [--healthy] [--label<label>]
>
> invoke slider nodes without a cluster name, you get a JSON summary from YARN itself. Invoke it with a cluster name and you get the view of the world from the AM: what nodes are there, it's view of their health, what components are on each node and historical data.
>
> If you do find a bug in the AA placement code, grabbing that JSON file for the cluster in question would be really helpful in understanding what's up.
>
> Not in this release
>
> There's no use of historical information when bringing up an AA cluster. Its planned, I just wanted to get the core placement working.
>
> Likely troublespots
>
> Here is what I haven't tested fully —which will need more field trials.
>
>    1.  How frequent/accurate are node updates coming from the RM to the Slider AM? We're relying on those to know when new hosts are added, or when existing hosts that were unavailable come online. Without notifications, requests for placements won't include those hostnames
>    2.  Scale surprises: does asking for every host but those in use hit limits?
>    3.  Handling unreliable nodes. There's no blacklisting here —or exclusion of untrusted nodes from the explicit list requested. Does this create a bias towards rescheduling work on unreliable servers?
>    4.  Startup time. How long does it take? I'm assuming, at a minimum 10s per desired component instance, even when there is cluster capacity. But you should not see delays if you are asking for AA placements of different component types; they will all be issued in parallel.
>
> Please download and play with it: I'm not doing any more to it this week.
>
> -Steve
>
>
>
>