You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Harald Kirsch <Ha...@raytion.com> on 2014/01/09 14:35:01 UTC

Best practices for host-bound source and sink?

Hi all,

suppose you need to process input available on source.host and the 
results should finally end up on dest.host. A storm topology shall do 
the processing.

It is easy to write a sprout that fetches the data and emits it into the 
topology. Similarly a sink-bolt can write the result somewhere.

But now suppose that the data is available only locally on source.host 
in the file system. Is it possible and natural to make source.host a 
machine in the Storm cluster but somehow make sure that *only* the 
sprout is executed on source.host. Similarly, would it be possible to 
bind a sink bolt to one specific machine, the dest.host?

If this is not a possible or not a preferred way to do it, are there any 
specific techniques used to provide input to a sprout beyond whatever 
remote access methods happen to be available (smb, nfs, ssh, http)?

Thanks for any hints,
Harald.


Re: Best practices for host-bound source and sink?

Posted by Harald Kirsch <Ha...@raytion.com>.
Cool, thanks for the link. It explains very good how to make sure a 
specific component can be run on a specific machine.

I assume if I want this special component to be the only component on 
that machine, I could allow only one slot on this machine?

But there seems to be no general feature which would keep other 
components off the special machine?

(Maybe I should just go off and read the source of DefaultScheduler, but 
hint are appreciated :-)

Harald.

On 09.01.2014 18:10, Susheel Kumar Gadalay wrote:
> Use custom scheduler to run the spout and bolt on the designated m/c.
>
> See this : http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
>
> On 1/9/14, Harald Kirsch <Ha...@raytion.com> wrote:
>> Hi all,
>>
>> suppose you need to process input available on source.host and the
>> results should finally end up on dest.host. A storm topology shall do
>> the processing.
>>
>> It is easy to write a sprout that fetches the data and emits it into the
>> topology. Similarly a sink-bolt can write the result somewhere.
>>
>> But now suppose that the data is available only locally on source.host
>> in the file system. Is it possible and natural to make source.host a
>> machine in the Storm cluster but somehow make sure that *only* the
>> sprout is executed on source.host. Similarly, would it be possible to
>> bind a sink bolt to one specific machine, the dest.host?
>>
>> If this is not a possible or not a preferred way to do it, are there any
>> specific techniques used to provide input to a sprout beyond whatever
>> remote access methods happen to be available (smb, nfs, ssh, http)?
>>
>> Thanks for any hints,
>> Harald.
>>
>>
>

-- 
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49-211-550266-0
Fax +49-211-550266-19
http://www.raytion.com

Re: Best practices for host-bound source and sink?

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Use custom scheduler to run the spout and bolt on the designated m/c.

See this : http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/

On 1/9/14, Harald Kirsch <Ha...@raytion.com> wrote:
> Hi all,
>
> suppose you need to process input available on source.host and the
> results should finally end up on dest.host. A storm topology shall do
> the processing.
>
> It is easy to write a sprout that fetches the data and emits it into the
> topology. Similarly a sink-bolt can write the result somewhere.
>
> But now suppose that the data is available only locally on source.host
> in the file system. Is it possible and natural to make source.host a
> machine in the Storm cluster but somehow make sure that *only* the
> sprout is executed on source.host. Similarly, would it be possible to
> bind a sink bolt to one specific machine, the dest.host?
>
> If this is not a possible or not a preferred way to do it, are there any
> specific techniques used to provide input to a sprout beyond whatever
> remote access methods happen to be available (smb, nfs, ssh, http)?
>
> Thanks for any hints,
> Harald.
>
>