You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Harald Kirsch <Ha...@raytion.com> on 2014/01/09 14:35:01 UTC
Best practices for host-bound source and sink?
Hi all,
suppose you need to process input available on source.host and the
results should finally end up on dest.host. A storm topology shall do
the processing.
It is easy to write a sprout that fetches the data and emits it into the
topology. Similarly a sink-bolt can write the result somewhere.
But now suppose that the data is available only locally on source.host
in the file system. Is it possible and natural to make source.host a
machine in the Storm cluster but somehow make sure that *only* the
sprout is executed on source.host. Similarly, would it be possible to
bind a sink bolt to one specific machine, the dest.host?
If this is not a possible or not a preferred way to do it, are there any
specific techniques used to provide input to a sprout beyond whatever
remote access methods happen to be available (smb, nfs, ssh, http)?
Thanks for any hints,
Harald.
Re: Best practices for host-bound source and sink?
Posted by Harald Kirsch <Ha...@raytion.com>.
Cool, thanks for the link. It explains very good how to make sure a
specific component can be run on a specific machine.
I assume if I want this special component to be the only component on
that machine, I could allow only one slot on this machine?
But there seems to be no general feature which would keep other
components off the special machine?
(Maybe I should just go off and read the source of DefaultScheduler, but
hint are appreciated :-)
Harald.
On 09.01.2014 18:10, Susheel Kumar Gadalay wrote:
> Use custom scheduler to run the spout and bolt on the designated m/c.
>
> See this : http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
>
> On 1/9/14, Harald Kirsch <Ha...@raytion.com> wrote:
>> Hi all,
>>
>> suppose you need to process input available on source.host and the
>> results should finally end up on dest.host. A storm topology shall do
>> the processing.
>>
>> It is easy to write a sprout that fetches the data and emits it into the
>> topology. Similarly a sink-bolt can write the result somewhere.
>>
>> But now suppose that the data is available only locally on source.host
>> in the file system. Is it possible and natural to make source.host a
>> machine in the Storm cluster but somehow make sure that *only* the
>> sprout is executed on source.host. Similarly, would it be possible to
>> bind a sink bolt to one specific machine, the dest.host?
>>
>> If this is not a possible or not a preferred way to do it, are there any
>> specific techniques used to provide input to a sprout beyond whatever
>> remote access methods happen to be available (smb, nfs, ssh, http)?
>>
>> Thanks for any hints,
>> Harald.
>>
>>
>
--
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49-211-550266-0
Fax +49-211-550266-19
http://www.raytion.com
Re: Best practices for host-bound source and sink?
Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
Use custom scheduler to run the spout and bolt on the designated m/c.
See this : http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
On 1/9/14, Harald Kirsch <Ha...@raytion.com> wrote:
> Hi all,
>
> suppose you need to process input available on source.host and the
> results should finally end up on dest.host. A storm topology shall do
> the processing.
>
> It is easy to write a sprout that fetches the data and emits it into the
> topology. Similarly a sink-bolt can write the result somewhere.
>
> But now suppose that the data is available only locally on source.host
> in the file system. Is it possible and natural to make source.host a
> machine in the Storm cluster but somehow make sure that *only* the
> sprout is executed on source.host. Similarly, would it be possible to
> bind a sink bolt to one specific machine, the dest.host?
>
> If this is not a possible or not a preferred way to do it, are there any
> specific techniques used to provide input to a sprout beyond whatever
> remote access methods happen to be available (smb, nfs, ssh, http)?
>
> Thanks for any hints,
> Harald.
>
>