You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" <mr...@bloomberg.net> on 2018/02/07 23:03:25 UTC

Limiting Available Machines for Topology

Given a multiple node Storm cluster, is it possible to ensure that a topology submitted on a specific machine runs only on that machine? More specifically, given a cluster of machines A, B, and C, if a topology is submitted from machine A, is there a way to guarantee that:

-The topology runs on machine A.
-If machine A crashes, the topology is not re-run on another machine.

I am guessing there isn't and that the answer to this is to run a leader nimbus per machine (nimbus.seeds: ["localhost"]), but I wanted to see if there was a way to do this that I am missing.

Re: Limiting Available Machines for Topology

Posted by Bobby Evans <bo...@apache.org>.
Placement is all done by the scheduler.  The default schedulers do not have
the capability.  They just try to spread things round robin around the
cluster.  This works well for small clusters that are built for a very
specific purpose.  Not so well for large clusters.  I have not used tag
aware scheduling but it looks like it works and there are people using it
in production.

Resource Aware Scheduling is not going to work to force your topology to be
run on a very specific node.  It has the option of accepting a hint for
which node(s) you want and which you don't.  But those are just hints.  We
also recently added in generic resources so you could define your own
resources and use that like tags.  Generic resources is still a work in
progress.  We are starting to roll it out to production, but we have not
even updated the UI to show the generic resources, so there may still be
some bugs in it and it needs more polish.

The big thing to be careful of with what you are trying to do is fault
tolerance and failure domains.  If there is really one and only one node
that your topology will work on if that node goes down you are done for.
Similarly for one and only one rack.

- Bobby

On Wed, Feb 7, 2018 at 5:55 PM Arnaud BOS <ar...@gmail.com> wrote:

> I **guess** you could use the “tag-aware scheduling” described here:
> https://inside.edited.com/taking-control-of-your-apache-storm-cluster-with-tag-aware-scheduling-b60aaaa5e37e
> Or maybe bend the "resource aware scheduler" presented here:
> https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.html
> to do what you want, but that sounds hackish.
>
> Anyways, I've never used any of them so I'm just sending the links for
> further reading.
>
> Hope this helps.
>
> On Thu, Feb 8, 2018, 12:12 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
> mrathbun1@bloomberg.net> wrote:
>
>> Given a multiple node Storm cluster, is it possible to ensure that a
>> topology submitted on a specific machine runs only on that machine? More
>> specifically, given a cluster of machines A, B, and C, if a topology is
>> submitted from machine A, is there a way to guarantee that:
>>
>> -The topology runs on machine A.
>> -If machine A crashes, the topology is not re-run on another machine.
>>
>> I am guessing there isn't and that the answer to this is to run a leader
>> nimbus per machine (nimbus.seeds: ["localhost"]), but I wanted to see if
>> there was a way to do this that I am missing.
>>
>

Re: Limiting Available Machines for Topology

Posted by Arnaud BOS <ar...@gmail.com>.
I **guess** you could use the “tag-aware scheduling” described here:
https://inside.edited.com/taking-control-of-your-apache-storm-cluster-with-tag-aware-scheduling-b60aaaa5e37e
Or maybe bend the "resource aware scheduler" presented here:
https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.html
to do what you want, but that sounds hackish.

Anyways, I've never used any of them so I'm just sending the links for
further reading.

Hope this helps.

On Thu, Feb 8, 2018, 12:12 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
mrathbun1@bloomberg.net> wrote:

> Given a multiple node Storm cluster, is it possible to ensure that a
> topology submitted on a specific machine runs only on that machine? More
> specifically, given a cluster of machines A, B, and C, if a topology is
> submitted from machine A, is there a way to guarantee that:
>
> -The topology runs on machine A.
> -If machine A crashes, the topology is not re-run on another machine.
>
> I am guessing there isn't and that the answer to this is to run a leader
> nimbus per machine (nimbus.seeds: ["localhost"]), but I wanted to see if
> there was a way to do this that I am missing.
>