You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by jeff saremi <je...@hotmail.com> on 2014/07/04 13:59:29 UTC

Choosing where your tasks run in Storm

I'm wondering if this concept applies to Storm and if there's a way to do this.

I'd like to limit the machines that certain spouts or bolts run on. There are many reasons for this. But for one let's assume that I have a bolt that is just a proxy for some legacy service. I want to monitor that service by way of the bolt and use it in my topology.
Another way of looking at it is that I want to have a topology that spans different "classes" of machines.
Let's say I have 3 classes of machines: small, medium, and large. Some topologies are limited to only one class of machines however some other topologies need to span two or more classes of machines.
How can I do this in storm?
Thanks
Jeff

RE: Choosing where your tasks run in Storm

Posted by jeff saremi <je...@hotmail.com>.

Michael and Andrew. thanks so much

Date: Fri, 4 Jul 2014 14:41:17 -0600
Subject: Re: Choosing where your tasks run in Storm
From: michael@fullcontact.com
To: user@storm.incubator.apache.org

You can make it happen with a custom scheduler, see this article (sorry for mangling, getting this link through SpamAssassin on the group was a nightmare):

<http> xumingming <dot> sinaapp <dotcom> <slash> 885/twitter-storm-how-to-develop-a-pluggable-scheduler/

But it's nothing I've seriously attempted before, the existing schedulers are in Clojure. It's not impossible to do for sure, but like Andrew said it might well just be easier to have separate clusters that share ZK clusters.

Michael Rose (@Xorlev)
Senior Platform Engineer, FullContact
michael@fullcontact.com

On Fri, Jul 4, 2014 at 10:28 AM, Andrew Montalenti <an...@parsely.com> wrote:

I don't think this is possible right now, though I have thought about the same thing before. It *might* be true that Storm's support for YARN could eventually lead to this kind of thing, but I don't know much about it. For now, you're best off having separate Storm clusters for different classes of machines. You could consider putting Kafka queues between them to ensure cross-topology message reliability guarantees. (e.g. have your I/O bound topology read from kafka and write to kafka, and have your CPU-bound topology read from the Kafka topic produced by the first queue).

---Andrew MontalentiCo-Founder & CTOhttp://parse.ly

On Fri, Jul 4, 2014 at 7:59 AM, jeff saremi <je...@hotmail.com> wrote:

I'm wondering if this concept applies to Storm and if there's a way to do this.

I'd like to limit the machines that certain spouts or bolts run on. There are many reasons for this. But for one let's assume that I have a bolt that is just a proxy for some legacy service. I want to monitor that service by way of the bolt and use it in my topology.

Another way of looking at it is that I want to have a topology that spans different "classes" of machines.
Let's say I have 3 classes of machines: small, medium, and large. Some topologies are limited to only one class of machines however some other topologies need to span two or more classes of machines.

How can I do this in storm?
Thanks
Jeff

Re: Choosing where your tasks run in Storm

Posted by Michael Rose <mi...@fullcontact.com>.

You can make it happen with a custom scheduler, see this article (sorry for
mangling, getting this link through SpamAssassin on the group was a
nightmare):

<http> xumingming <dot> sinaapp <dotcom>
<slash> 885/twitter-storm-how-to-develop-a-pluggable-scheduler/

But it's nothing I've seriously attempted before, the existing schedulers
are in Clojure. It's not impossible to do for sure, but like Andrew said it
might well just be easier to have separate clusters that share ZK clusters.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
michael@fullcontact.com


On Fri, Jul 4, 2014 at 10:28 AM, Andrew Montalenti <an...@parsely.com>
wrote:

> I don't think this is possible right now, though I have thought about the
> same thing before. It *might* be true that Storm's support for YARN could
> eventually lead to this kind of thing, but I don't know much about it. For
> now, you're best off having separate Storm clusters for different classes
> of machines. You could consider putting Kafka queues between them to ensure
> cross-topology message reliability guarantees. (e.g. have your I/O bound
> topology read from kafka and write to kafka, and have your CPU-bound
> topology read from the Kafka topic produced by the first queue).
>
> ---
> Andrew Montalenti
> Co-Founder & CTO
> http://parse.ly
>
> On Fri, Jul 4, 2014 at 7:59 AM, jeff saremi <je...@hotmail.com>
> wrote:
>
>> I'm wondering if this concept applies to Storm and if there's a way to do
>> this.
>>
>> I'd like to limit the machines that certain spouts or bolts run on. There
>> are many reasons for this. But for one let's assume that I have a bolt
>> that is just a proxy for some legacy service. I want to monitor that
>> service by way of the bolt and use it in my topology.
>> Another way of looking at it is that I want to have a topology that spans
>> different "classes" of machines.
>> Let's say I have 3 classes of machines: small, medium, and large. Some
>> topologies are limited to only one class of machines however some other
>> topologies need to span two or more classes of machines.
>> How can I do this in storm?
>> Thanks
>> Jeff
>>
>
>

Re: Choosing where your tasks run in Storm

Posted by Andrew Montalenti <an...@parsely.com>.

I don't think this is possible right now, though I have thought about the
same thing before. It *might* be true that Storm's support for YARN could
eventually lead to this kind of thing, but I don't know much about it. For
now, you're best off having separate Storm clusters for different classes
of machines. You could consider putting Kafka queues between them to ensure
cross-topology message reliability guarantees. (e.g. have your I/O bound
topology read from kafka and write to kafka, and have your CPU-bound
topology read from the Kafka topic produced by the first queue).

---
Andrew Montalenti
Co-Founder & CTO
http://parse.ly

On Fri, Jul 4, 2014 at 7:59 AM, jeff saremi <je...@hotmail.com> wrote:

> I'm wondering if this concept applies to Storm and if there's a way to do
> this.
>
> I'd like to limit the machines that certain spouts or bolts run on. There
> are many reasons for this. But for one let's assume that I have a bolt
> that is just a proxy for some legacy service. I want to monitor that
> service by way of the bolt and use it in my topology.
> Another way of looking at it is that I want to have a topology that spans
> different "classes" of machines.
> Let's say I have 3 classes of machines: small, medium, and large. Some
> topologies are limited to only one class of machines however some other
> topologies need to span two or more classes of machines.
> How can I do this in storm?
> Thanks
> Jeff
>