You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by xu...@gmail.com on 2015/03/18 18:05:13 UTC
How to programatically activate a spout from inside the topology
Hi,
I am building a topology, in which it first needs to read some persisted data (accounts, recovery point, etc) before all the bolts can start processing tuples.
Ideally the spout starts to emit tuples only after all the required data is read into memory (maybe on spouts, maybe on bolts). What’s the general approach to deal with such use case?
Alternatively I can make each bolt to ignore/fail tuples until it’s ready to process, but that means either loss of message or futile spout replays.
Thanks,
Jia
-Jia
Re: How to programatically activate a spout from inside the
topology
Posted by xu...@gmail.com.
Many thanks, Jens. This is exactly what I need.
-Jia
On Wed, Mar 18, 2015 at 10:49 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
> Hi Jia,
> Zitat von xujiaxj@gmail.com:
>> Hi,
>>
>>
>> I am building a topology, in which it first needs to read some
>> persisted data (accounts, recovery point, etc) before all the bolts
>> can start processing tuples.
>>
>>
>> Ideally the spout starts to emit tuples only after all the required
>> data is read into memory (maybe on spouts, maybe on bolts). What’s
>> the general approach to deal with such use case?
> I dunno about the "general approach", but I'd make the spout send an
> initial "special management tuple" (i.e. on a separate channel) to all
> bolts and wait until the ACK comes back... every bolt can initialize
> on that message (if not during prepare() ) and only ack the tuple once
> the init is done.
>> Alternatively I can make each bolt to ignore/fail tuples until it’s
>> ready to process, but that means either loss of message or futile
>> spout replays.
> Doesn't sound production-level to me ;)
> Regards,
> Jens
Re: How to programatically activate a spout from inside the
topology
Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Jia,
Zitat von xujiaxj@gmail.com:
> Hi,
>
>
> I am building a topology, in which it first needs to read some
> persisted data (accounts, recovery point, etc) before all the bolts
> can start processing tuples.
>
>
> Ideally the spout starts to emit tuples only after all the required
> data is read into memory (maybe on spouts, maybe on bolts). What’s
> the general approach to deal with such use case?
I dunno about the "general approach", but I'd make the spout send an
initial "special management tuple" (i.e. on a separate channel) to all
bolts and wait until the ACK comes back... every bolt can initialize
on that message (if not during prepare() ) and only ack the tuple once
the init is done.
> Alternatively I can make each bolt to ignore/fail tuples until it’s
> ready to process, but that means either loss of message or futile
> spout replays.
Doesn't sound production-level to me ;)
Regards,
Jens