You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Srinath C <sr...@gmail.com> on 2014/03/19 02:05:47 UTC

Distribution of tasks across the storm cluster

Hi,
   Can anyone point me to some notes on how storm decides to distribute the
tasks among its workers. The behavior am seeing is that all tasks of a
particular type are being grouped into one worker process.
   To add more details to my use-case, I have a spout that is sourcing
tuples from a rabbitmq cluster. I want to distribute the spout tasks across
different storm workers so that the throughput is higher and the load of
ingesting the messages is distributed across all the workers in the storm
cluster. Any suggestions on how to influence the distribution of tasks?

Thanks,
Srinath.

Re: Distribution of tasks across the storm cluster

Posted by Srinath C <sr...@gmail.com>.
Thanks for replying Drew.
I'm using a field grouping to stream out the tuples generated by the spouts.

But here is what I'm trying to solve - The spouts would be IO intensive
doing periodic polling of data from external sources utilizing resources
like persistent connections (thus network bandwidth), file descriptors,
memory, etc. I was thinking that if I could distribute these spout
instances across different worker machines then there could be a good
distribution of load.

Ex: I can structure my topology "MySpout" with 10 tasks but I would like to
distribute them as 3 tasks in each worker instances (not even worker
processes). How do I influence that kind of distribution? Right now, I see
that even when I set the worker processes to 3, storm is placing all the
tasks of the spout into one worker process only.

Hope that clarifies.

Thanks,
Srinath






On Fri, Mar 21, 2014 at 10:41 PM, Drew Goya <dr...@gradientx.com> wrote:

> What kind of grouping (if any) are you using on the tuples coming out of
> your spout?
>
> If you want them evenly spread across a number of worker bolts, set that
> bolt to subscribe to the stream using a shuffle grouping.
>
> http://storm.incubator.apache.org/documentation/Concepts.html
>
> Search there for "Stream groupings"
>
>
> On Thu, Mar 20, 2014 at 10:11 AM, Srinath C <sr...@gmail.com> wrote:
>
>> Anyone?
>>
>>
>> On Wed, Mar 19, 2014 at 6:35 AM, Srinath C <sr...@gmail.com> wrote:
>>
>>> Hi,
>>>    Can anyone point me to some notes on how storm decides to distribute
>>> the tasks among its workers. The behavior am seeing is that all tasks of a
>>> particular type are being grouped into one worker process.
>>>    To add more details to my use-case, I have a spout that is sourcing
>>> tuples from a rabbitmq cluster. I want to distribute the spout tasks across
>>> different storm workers so that the throughput is higher and the load of
>>> ingesting the messages is distributed across all the workers in the storm
>>> cluster. Any suggestions on how to influence the distribution of tasks?
>>>
>>> Thanks,
>>> Srinath.
>>>
>>>
>>
>

Re: Distribution of tasks across the storm cluster

Posted by Drew Goya <dr...@gradientx.com>.
What kind of grouping (if any) are you using on the tuples coming out of
your spout?

If you want them evenly spread across a number of worker bolts, set that
bolt to subscribe to the stream using a shuffle grouping.

http://storm.incubator.apache.org/documentation/Concepts.html

Search there for "Stream groupings"


On Thu, Mar 20, 2014 at 10:11 AM, Srinath C <sr...@gmail.com> wrote:

> Anyone?
>
>
> On Wed, Mar 19, 2014 at 6:35 AM, Srinath C <sr...@gmail.com> wrote:
>
>> Hi,
>>    Can anyone point me to some notes on how storm decides to distribute
>> the tasks among its workers. The behavior am seeing is that all tasks of a
>> particular type are being grouped into one worker process.
>>    To add more details to my use-case, I have a spout that is sourcing
>> tuples from a rabbitmq cluster. I want to distribute the spout tasks across
>> different storm workers so that the throughput is higher and the load of
>> ingesting the messages is distributed across all the workers in the storm
>> cluster. Any suggestions on how to influence the distribution of tasks?
>>
>> Thanks,
>> Srinath.
>>
>>
>

Re: Distribution of tasks across the storm cluster

Posted by Srinath C <sr...@gmail.com>.
Anyone?


On Wed, Mar 19, 2014 at 6:35 AM, Srinath C <sr...@gmail.com> wrote:

> Hi,
>    Can anyone point me to some notes on how storm decides to distribute
> the tasks among its workers. The behavior am seeing is that all tasks of a
> particular type are being grouped into one worker process.
>    To add more details to my use-case, I have a spout that is sourcing
> tuples from a rabbitmq cluster. I want to distribute the spout tasks across
> different storm workers so that the throughput is higher and the load of
> ingesting the messages is distributed across all the workers in the storm
> cluster. Any suggestions on how to influence the distribution of tasks?
>
> Thanks,
> Srinath.
>
>