You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Spico Florin <sp...@gmail.com> on 2014/08/06 11:08:34 UTC

Task colocation in the same JVM or same node

Hello!
   I have a use case where I need that two bolts should be colocated either
on the same worker JVM or in the same node.
  We would like to know about this feature for the following reasons:
1. Computing the time that took for a tuple to be processed by the whole
topology
  Suppose that you have the topology:
    Spout->B1->B2->BoltMeasureTime
(where BoltMeasure time is the bolt where we would like to compute the
total time spent by the tuple in the topology),
we would like that BoltMeasureTime to be placed on the same JVM as the
Spout or on the same node.

2. Suppose that you have a Spout that is consuming data from a Database.
For performance reasons perhaps you would like to place the Spout nearby
the Database.

I know that Nimbus is responsible to spread the tasks among the workers
based on round robin algorithm but I'm wondering if there is a different
way to specify where the tasks to be executed.

I look forward for your suggestions/comments.

Best regards,
  Florin

Re: Task colocation in the same JVM or same node

Posted by Nathan Leung <nc...@gmail.com>.

You can try googling "storm pluggable scheduler" and use the google cached
version of the page.  Also the github link (this one?
https://github.com/xumingming/storm-lib/blob/master/src/jvm/storm/DemoScheduler.java)
works for me.

-Nathan


On Wed, Aug 6, 2014 at 10:26 AM, Spico Florin <sp...@gmail.com> wrote:

> Hello!
>  Thank you very much for your reply. I had a look unfortunately the page
> is loading slowing on my side and I could not see the entire page. If you
> can manage to save the blog page into a pdf file and sent to the forum it
> will be very helpful. Also in the article the reference to the plugable
> scheduler is pinting out to a github page that doesn't exist. Do you know
> if there is an updated documentation about this subject?
>
> Thanks in advance.
>   Florin
>
>
> On Wed, Aug 6, 2014 at 3:16 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> You would need to design a custom scheduler:
>> http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
>>
>>
>> On Wed, Aug 6, 2014 at 5:08 AM, Spico Florin <sp...@gmail.com>
>> wrote:
>>
>>> Hello!
>>>    I have a use case where I need that two bolts should be colocated
>>> either on the same worker JVM or in the same node.
>>>   We would like to know about this feature for the following reasons:
>>> 1. Computing the time that took for a tuple to be processed by the whole
>>> topology
>>>   Suppose that you have the topology:
>>>     Spout->B1->B2->BoltMeasureTime
>>> (where BoltMeasure time is the bolt where we would like to compute the
>>> total time spent by the tuple in the topology),
>>> we would like that BoltMeasureTime to be placed on the same JVM as the
>>> Spout or on the same node.
>>>
>>> 2. Suppose that you have a Spout that is consuming data from a Database.
>>> For performance reasons perhaps you would like to place the Spout nearby
>>> the Database.
>>>
>>> I know that Nimbus is responsible to spread the tasks among the workers
>>> based on round robin algorithm but I'm wondering if there is a different
>>> way to specify where the tasks to be executed.
>>>
>>> I look forward for your suggestions/comments.
>>>
>>> Best regards,
>>>   Florin
>>>
>>>
>>>
>>
>>
>

Re: Task colocation in the same JVM or same node

Posted by Spico Florin <sp...@gmail.com>.

Hello!
 Thank you very much for your reply. I had a look unfortunately the page is
loading slowing on my side and I could not see the entire page. If you can
manage to save the blog page into a pdf file and sent to the forum it will
be very helpful. Also in the article the reference to the plugable
scheduler is pinting out to a github page that doesn't exist. Do you know
if there is an updated documentation about this subject?

Thanks in advance.
  Florin


On Wed, Aug 6, 2014 at 3:16 PM, Nathan Leung <nc...@gmail.com> wrote:

> You would need to design a custom scheduler:
> http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
>
>
> On Wed, Aug 6, 2014 at 5:08 AM, Spico Florin <sp...@gmail.com>
> wrote:
>
>> Hello!
>>    I have a use case where I need that two bolts should be colocated
>> either on the same worker JVM or in the same node.
>>   We would like to know about this feature for the following reasons:
>> 1. Computing the time that took for a tuple to be processed by the whole
>> topology
>>   Suppose that you have the topology:
>>     Spout->B1->B2->BoltMeasureTime
>> (where BoltMeasure time is the bolt where we would like to compute the
>> total time spent by the tuple in the topology),
>> we would like that BoltMeasureTime to be placed on the same JVM as the
>> Spout or on the same node.
>>
>> 2. Suppose that you have a Spout that is consuming data from a Database.
>> For performance reasons perhaps you would like to place the Spout nearby
>> the Database.
>>
>> I know that Nimbus is responsible to spread the tasks among the workers
>> based on round robin algorithm but I'm wondering if there is a different
>> way to specify where the tasks to be executed.
>>
>> I look forward for your suggestions/comments.
>>
>> Best regards,
>>   Florin
>>
>>
>>
>
>

Re: Task colocation in the same JVM or same node

Posted by Nathan Leung <nc...@gmail.com>.

You would need to design a custom scheduler:
http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/


On Wed, Aug 6, 2014 at 5:08 AM, Spico Florin <sp...@gmail.com> wrote:

> Hello!
>    I have a use case where I need that two bolts should be colocated
> either on the same worker JVM or in the same node.
>   We would like to know about this feature for the following reasons:
> 1. Computing the time that took for a tuple to be processed by the whole
> topology
>   Suppose that you have the topology:
>     Spout->B1->B2->BoltMeasureTime
> (where BoltMeasure time is the bolt where we would like to compute the
> total time spent by the tuple in the topology),
> we would like that BoltMeasureTime to be placed on the same JVM as the
> Spout or on the same node.
>
> 2. Suppose that you have a Spout that is consuming data from a Database.
> For performance reasons perhaps you would like to place the Spout nearby
> the Database.
>
> I know that Nimbus is responsible to spread the tasks among the workers
> based on round robin algorithm but I'm wondering if there is a different
> way to specify where the tasks to be executed.
>
> I look forward for your suggestions/comments.
>
> Best regards,
>   Florin
>
>
>