You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by xu...@gmail.com on 2015/03/18 18:05:13 UTC

How to programatically activate a spout from inside the topology

Hi,


I am building a topology, in which it first needs to read some persisted data (accounts, recovery point, etc) before all the bolts can start processing tuples.


Ideally the spout starts to emit tuples only after all the required data is read into memory (maybe on spouts, maybe on bolts). What’s the general approach to deal with such use case?


Alternatively I can make each bolt to ignore/fail tuples until it’s ready to process, but that means either loss of message or futile spout replays.


Thanks,
Jia





-Jia

Re: How to programatically activate a spout from inside the topology

Posted by xu...@gmail.com.
Many thanks, Jens. This is exactly what I need.

-Jia

On Wed, Mar 18, 2015 at 10:49 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:

> Hi Jia,
> Zitat von xujiaxj@gmail.com:
>> Hi,
>>
>>
>> I am building a topology, in which it first needs to read some  
>> persisted data (accounts, recovery point, etc) before all the bolts  
>> can start processing tuples.
>>
>>
>> Ideally the spout starts to emit tuples only after all the required  
>> data is read into memory (maybe on spouts, maybe on bolts). What’s  
>> the general approach to deal with such use case?
> I dunno about the "general approach", but I'd make the spout send an  
> initial "special management tuple" (i.e. on a separate channel) to all  
> bolts and wait until the ACK comes back... every bolt can initialize  
> on that message (if not during prepare() ) and only ack the tuple once  
> the init is done.
>> Alternatively I can make each bolt to ignore/fail tuples until it’s  
>> ready to process, but that means either loss of message or futile  
>> spout replays.
> Doesn't sound production-level to me ;)
> Regards,
> Jens

Re: How to programatically activate a spout from inside the topology

Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Jia,

Zitat von xujiaxj@gmail.com:
> Hi,
>
>
> I am building a topology, in which it first needs to read some  
> persisted data (accounts, recovery point, etc) before all the bolts  
> can start processing tuples.
>
>
> Ideally the spout starts to emit tuples only after all the required  
> data is read into memory (maybe on spouts, maybe on bolts). What’s  
> the general approach to deal with such use case?

I dunno about the "general approach", but I'd make the spout send an  
initial "special management tuple" (i.e. on a separate channel) to all  
bolts and wait until the ACK comes back... every bolt can initialize  
on that message (if not during prepare() ) and only ack the tuple once  
the init is done.

> Alternatively I can make each bolt to ignore/fail tuples until it’s  
> ready to process, but that means either loss of message or futile  
> spout replays.

Doesn't sound production-level to me ;)

Regards,
Jens