You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Shawn Bonnin <sh...@gmail.com> on 2015/01/21 19:57:31 UTC

How does storm guarantee order of tuples processed?

Our use case requires the tuples be processed in order across failures.

So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
that aggregates data from B & C and writes to a database.

We want to make sure that when we use tuple at a time processing OR use the
Trident API, the data always gets processed in the same order as it was
read by our spout. Given that between Bolt B & C there would be parallelism
and intermittent failures, my question is  the following -

How does Storm guarantee processing order of tuples?

Thanks in advance!

Re: How does storm guarantee order of tuples processed?

Posted by Shawn Bonnin <sh...@gmail.com>.

Got it. Thanks!

On Wed, Jan 21, 2015 at 1:47 PM, Nathan Marz <na...@nathanmarz.com> wrote:

> Spouts and bolts provides you an at-least once guarantee, so it's
> completely up to you to figure out how to get your app to work with that.
> Storm is unable to provide you any help besides replaying the tuples.
>
> Trident, on the other hand, does all state updates in the "State"
> abstraction and gives you a monotonically increasing batch id whenever
> state updates are to be applied. If you store that batch id with whatever
> state you're updating, you can detect when you're seeing something that's
> been successfully processed before or whether it's brand new. This is
> described in that state doc I sent.
>
> On Wed, Jan 21, 2015 at 4:15 PM, Shawn Bonnin <sh...@gmail.com>
> wrote:
>
>> Nathan, First, thanks a lot for the quick response. I read through the
>> Trident guarantees. Seems like micro-batching will help with the exactly
>> once guarantees on the bolts that write to external data stores in the
>> commit phase of a batch.
>>
>> However, I have  clarifying question to what you said -
>>
>> *Suppose for example your spout emits tuples A, B, C, D, E and tuple C
>> fails. A spout like KestrelSpout would re-emit only tuple C. KafkaSpout, on
>> the other hand, would also re-emit all tuples after the failed tuple. So it
>> would re-emit C, D, and E, even if D and E were successfully processed'*
>>
>> My question is how does the Kafka Spout know how many tuples were sent
>> through after C? Does it rely on Zookeeper to get the offsets and just
>> replay everything after that offset? If yes then do we have to handle the
>> repercussions of state corruption etc. in our downstream bolts? Our
>> downstream bolts will be looking for event sequence based patterns so when
>> they see the same event twice, they will need smarts to know when that was
>> due to a system failure and replay vs. an actual business occurrence.
>>
>> Seems like these smarts will need to be built regardless of whether we do
>> tuple at a time processing or use Trident.
>>
>> Am I correct in my assessment?
>>
>>
>> Thanks a lot!
>>
>> On Wed, Jan 21, 2015 at 11:43 AM, Nathan Marz <na...@nathanmarz.com>
>> wrote:
>>
>>> There's no such thing as a total order in a distributed system, as
>>> streams are processed in parallel. The ordering guarantee Storm provides is
>>> that tuples sent between tasks are received in the order they were sent.
>>>
>>> Another part of your question is what kind of ordering guarantees you
>>> get during failures. With regular Storm, when a tuple fails it depends on
>>> the spout to determine what to re-emit. Suppose for example your spout
>>> emits tuples A, B, C, D, E and tuple C fails. A spout like KestrelSpout
>>> would re-emit only tuple C. KafkaSpout, on the other hand, would also
>>> re-emit all tuples after the failed tuple. So it would re-emit C, D, and E,
>>> even if D and E were successfully processed.
>>>
>>> Trident provides stronger ordering guarantees, as it provides a total
>>> ordering among the commit phases for batches. So if a batch fails to commit
>>> it will be retried indefinitely until it succeeds. See
>>> http://storm.apache.org/documentation/Trident-state.html and
>>> http://storm.apache.org/documentation/Trident-spouts.html for more info
>>> on this.
>>>
>>> On Wed, Jan 21, 2015 at 2:34 PM, Shawn Bonnin <sh...@gmail.com>
>>> wrote:
>>>
>>>> Trying to look for patterns in the input stream based on the arrival
>>>> sequence. We can use something like kafka on the input so guarantee order
>>>> but once the tuples enter the topology, how can we make sure that they are
>>>> processed in the same order as they arrived on Kafka.
>>>>
>>>> On Wed, Jan 21, 2015 at 11:30 AM, Naresh Kosgi <na...@gmail.com>
>>>> wrote:
>>>>
>>>>> Also more information about why you need a certain order for
>>>>> processing would help in recommending how to approach the problem
>>>>>
>>>>> On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <na...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Storm as a framework does not guarantee order.  You will have to code
>>>>>> it if you would like your tuples processed in certain order
>>>>>>
>>>>>> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <sh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Resending...
>>>>>>>
>>>>>>> Our use case requires the tuples be processed in order across
>>>>>>> failures.
>>>>>>>
>>>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>>>
>>>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>>>> use the Trident API, the data always gets processed in the same order as it
>>>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>>>
>>>>>>> How does Storm guarantee processing order of tuples?
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <
>>>>>>> shawnbonnin@gmail.com> wrote:
>>>>>>>
>>>>>>>> Our use case requires the tuples be processed in order across
>>>>>>>> failures.
>>>>>>>>
>>>>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>>>>
>>>>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>>>>> use the Trident API, the data always gets processed in the same order as it
>>>>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>>>>
>>>>>>>> How does Storm guarantee processing order of tuples?
>>>>>>>>
>>>>>>>> Thanks in advance!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Twitter: @nathanmarz
>>> http://nathanmarz.com
>>>
>>
>>
>
>
> --
> Twitter: @nathanmarz
> http://nathanmarz.com
>

Re: How does storm guarantee order of tuples processed?

Posted by Nathan Marz <na...@nathanmarz.com>.

Spouts and bolts provides you an at-least once guarantee, so it's
completely up to you to figure out how to get your app to work with that.
Storm is unable to provide you any help besides replaying the tuples.

Trident, on the other hand, does all state updates in the "State"
abstraction and gives you a monotonically increasing batch id whenever
state updates are to be applied. If you store that batch id with whatever
state you're updating, you can detect when you're seeing something that's
been successfully processed before or whether it's brand new. This is
described in that state doc I sent.

On Wed, Jan 21, 2015 at 4:15 PM, Shawn Bonnin <sh...@gmail.com> wrote:

> Nathan, First, thanks a lot for the quick response. I read through the
> Trident guarantees. Seems like micro-batching will help with the exactly
> once guarantees on the bolts that write to external data stores in the
> commit phase of a batch.
>
> However, I have  clarifying question to what you said -
>
> *Suppose for example your spout emits tuples A, B, C, D, E and tuple C
> fails. A spout like KestrelSpout would re-emit only tuple C. KafkaSpout, on
> the other hand, would also re-emit all tuples after the failed tuple. So it
> would re-emit C, D, and E, even if D and E were successfully processed'*
>
> My question is how does the Kafka Spout know how many tuples were sent
> through after C? Does it rely on Zookeeper to get the offsets and just
> replay everything after that offset? If yes then do we have to handle the
> repercussions of state corruption etc. in our downstream bolts? Our
> downstream bolts will be looking for event sequence based patterns so when
> they see the same event twice, they will need smarts to know when that was
> due to a system failure and replay vs. an actual business occurrence.
>
> Seems like these smarts will need to be built regardless of whether we do
> tuple at a time processing or use Trident.
>
> Am I correct in my assessment?
>
>
> Thanks a lot!
>
> On Wed, Jan 21, 2015 at 11:43 AM, Nathan Marz <na...@nathanmarz.com>
> wrote:
>
>> There's no such thing as a total order in a distributed system, as
>> streams are processed in parallel. The ordering guarantee Storm provides is
>> that tuples sent between tasks are received in the order they were sent.
>>
>> Another part of your question is what kind of ordering guarantees you get
>> during failures. With regular Storm, when a tuple fails it depends on the
>> spout to determine what to re-emit. Suppose for example your spout emits
>> tuples A, B, C, D, E and tuple C fails. A spout like KestrelSpout would
>> re-emit only tuple C. KafkaSpout, on the other hand, would also re-emit all
>> tuples after the failed tuple. So it would re-emit C, D, and E, even if D
>> and E were successfully processed.
>>
>> Trident provides stronger ordering guarantees, as it provides a total
>> ordering among the commit phases for batches. So if a batch fails to commit
>> it will be retried indefinitely until it succeeds. See
>> http://storm.apache.org/documentation/Trident-state.html and
>> http://storm.apache.org/documentation/Trident-spouts.html for more info
>> on this.
>>
>> On Wed, Jan 21, 2015 at 2:34 PM, Shawn Bonnin <sh...@gmail.com>
>> wrote:
>>
>>> Trying to look for patterns in the input stream based on the arrival
>>> sequence. We can use something like kafka on the input so guarantee order
>>> but once the tuples enter the topology, how can we make sure that they are
>>> processed in the same order as they arrived on Kafka.
>>>
>>> On Wed, Jan 21, 2015 at 11:30 AM, Naresh Kosgi <na...@gmail.com>
>>> wrote:
>>>
>>>> Also more information about why you need a certain order for processing
>>>> would help in recommending how to approach the problem
>>>>
>>>> On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <na...@gmail.com>
>>>> wrote:
>>>>
>>>>> Storm as a framework does not guarantee order.  You will have to code
>>>>> it if you would like your tuples processed in certain order
>>>>>
>>>>> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <sh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Resending...
>>>>>>
>>>>>> Our use case requires the tuples be processed in order across
>>>>>> failures.
>>>>>>
>>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>>
>>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>>> use the Trident API, the data always gets processed in the same order as it
>>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>>
>>>>>> How does Storm guarantee processing order of tuples?
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <shawnbonnin@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Our use case requires the tuples be processed in order across
>>>>>>> failures.
>>>>>>>
>>>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>>>
>>>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>>>> use the Trident API, the data always gets processed in the same order as it
>>>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>>>
>>>>>>> How does Storm guarantee processing order of tuples?
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Twitter: @nathanmarz
>> http://nathanmarz.com
>>
>
>


-- 
Twitter: @nathanmarz
http://nathanmarz.com

Re: How does storm guarantee order of tuples processed?

Posted by Shawn Bonnin <sh...@gmail.com>.

Nathan, First, thanks a lot for the quick response. I read through the
Trident guarantees. Seems like micro-batching will help with the exactly
once guarantees on the bolts that write to external data stores in the
commit phase of a batch.

However, I have  clarifying question to what you said -

*Suppose for example your spout emits tuples A, B, C, D, E and tuple C
fails. A spout like KestrelSpout would re-emit only tuple C. KafkaSpout, on
the other hand, would also re-emit all tuples after the failed tuple. So it
would re-emit C, D, and E, even if D and E were successfully processed'*

My question is how does the Kafka Spout know how many tuples were sent
through after C? Does it rely on Zookeeper to get the offsets and just
replay everything after that offset? If yes then do we have to handle the
repercussions of state corruption etc. in our downstream bolts? Our
downstream bolts will be looking for event sequence based patterns so when
they see the same event twice, they will need smarts to know when that was
due to a system failure and replay vs. an actual business occurrence.

Seems like these smarts will need to be built regardless of whether we do
tuple at a time processing or use Trident.

Am I correct in my assessment?


Thanks a lot!

On Wed, Jan 21, 2015 at 11:43 AM, Nathan Marz <na...@nathanmarz.com> wrote:

> There's no such thing as a total order in a distributed system, as streams
> are processed in parallel. The ordering guarantee Storm provides is that
> tuples sent between tasks are received in the order they were sent.
>
> Another part of your question is what kind of ordering guarantees you get
> during failures. With regular Storm, when a tuple fails it depends on the
> spout to determine what to re-emit. Suppose for example your spout emits
> tuples A, B, C, D, E and tuple C fails. A spout like KestrelSpout would
> re-emit only tuple C. KafkaSpout, on the other hand, would also re-emit all
> tuples after the failed tuple. So it would re-emit C, D, and E, even if D
> and E were successfully processed.
>
> Trident provides stronger ordering guarantees, as it provides a total
> ordering among the commit phases for batches. So if a batch fails to commit
> it will be retried indefinitely until it succeeds. See
> http://storm.apache.org/documentation/Trident-state.html and
> http://storm.apache.org/documentation/Trident-spouts.html for more info
> on this.
>
> On Wed, Jan 21, 2015 at 2:34 PM, Shawn Bonnin <sh...@gmail.com>
> wrote:
>
>> Trying to look for patterns in the input stream based on the arrival
>> sequence. We can use something like kafka on the input so guarantee order
>> but once the tuples enter the topology, how can we make sure that they are
>> processed in the same order as they arrived on Kafka.
>>
>> On Wed, Jan 21, 2015 at 11:30 AM, Naresh Kosgi <na...@gmail.com>
>> wrote:
>>
>>> Also more information about why you need a certain order for processing
>>> would help in recommending how to approach the problem
>>>
>>> On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <na...@gmail.com>
>>> wrote:
>>>
>>>> Storm as a framework does not guarantee order.  You will have to code
>>>> it if you would like your tuples processed in certain order
>>>>
>>>> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <sh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Resending...
>>>>>
>>>>> Our use case requires the tuples be processed in order across failures.
>>>>>
>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>
>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>> use the Trident API, the data always gets processed in the same order as it
>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>
>>>>> How does Storm guarantee processing order of tuples?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <sh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Our use case requires the tuples be processed in order across
>>>>>> failures.
>>>>>>
>>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>>
>>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>>> use the Trident API, the data always gets processed in the same order as it
>>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>>
>>>>>> How does Storm guarantee processing order of tuples?
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Twitter: @nathanmarz
> http://nathanmarz.com
>

Re: How does storm guarantee order of tuples processed?

Posted by Nathan Marz <na...@nathanmarz.com>.

There's no such thing as a total order in a distributed system, as streams
are processed in parallel. The ordering guarantee Storm provides is that
tuples sent between tasks are received in the order they were sent.

Another part of your question is what kind of ordering guarantees you get
during failures. With regular Storm, when a tuple fails it depends on the
spout to determine what to re-emit. Suppose for example your spout emits
tuples A, B, C, D, E and tuple C fails. A spout like KestrelSpout would
re-emit only tuple C. KafkaSpout, on the other hand, would also re-emit all
tuples after the failed tuple. So it would re-emit C, D, and E, even if D
and E were successfully processed.

Trident provides stronger ordering guarantees, as it provides a total
ordering among the commit phases for batches. So if a batch fails to commit
it will be retried indefinitely until it succeeds. See
http://storm.apache.org/documentation/Trident-state.html and
http://storm.apache.org/documentation/Trident-spouts.html for more info on
this.

On Wed, Jan 21, 2015 at 2:34 PM, Shawn Bonnin <sh...@gmail.com> wrote:

> Trying to look for patterns in the input stream based on the arrival
> sequence. We can use something like kafka on the input so guarantee order
> but once the tuples enter the topology, how can we make sure that they are
> processed in the same order as they arrived on Kafka.
>
> On Wed, Jan 21, 2015 at 11:30 AM, Naresh Kosgi <na...@gmail.com>
> wrote:
>
>> Also more information about why you need a certain order for processing
>> would help in recommending how to approach the problem
>>
>> On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <na...@gmail.com>
>> wrote:
>>
>>> Storm as a framework does not guarantee order.  You will have to code it
>>> if you would like your tuples processed in certain order
>>>
>>> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <sh...@gmail.com>
>>> wrote:
>>>
>>>> Resending...
>>>>
>>>> Our use case requires the tuples be processed in order across failures.
>>>>
>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
>>>> that aggregates data from B & C and writes to a database.
>>>>
>>>> We want to make sure that when we use tuple at a time processing OR use
>>>> the Trident API, the data always gets processed in the same order as it was
>>>> read by our spout. Given that between Bolt B & C there would be parallelism
>>>> and intermittent failures, my question is  the following -
>>>>
>>>> How does Storm guarantee processing order of tuples?
>>>>
>>>> Thanks in advance!
>>>>
>>>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <sh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Our use case requires the tuples be processed in order across failures.
>>>>>
>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>
>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>> use the Trident API, the data always gets processed in the same order as it
>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>
>>>>> How does Storm guarantee processing order of tuples?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Twitter: @nathanmarz
http://nathanmarz.com

Re: How does storm guarantee order of tuples processed?

Posted by Shawn Bonnin <sh...@gmail.com>.

Trying to look for patterns in the input stream based on the arrival
sequence. We can use something like kafka on the input so guarantee order
but once the tuples enter the topology, how can we make sure that they are
processed in the same order as they arrived on Kafka.

On Wed, Jan 21, 2015 at 11:30 AM, Naresh Kosgi <na...@gmail.com>
wrote:

> Also more information about why you need a certain order for processing
> would help in recommending how to approach the problem
>
> On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <na...@gmail.com>
> wrote:
>
>> Storm as a framework does not guarantee order.  You will have to code it
>> if you would like your tuples processed in certain order
>>
>> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <sh...@gmail.com>
>> wrote:
>>
>>> Resending...
>>>
>>> Our use case requires the tuples be processed in order across failures.
>>>
>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
>>> that aggregates data from B & C and writes to a database.
>>>
>>> We want to make sure that when we use tuple at a time processing OR use
>>> the Trident API, the data always gets processed in the same order as it was
>>> read by our spout. Given that between Bolt B & C there would be parallelism
>>> and intermittent failures, my question is  the following -
>>>
>>> How does Storm guarantee processing order of tuples?
>>>
>>> Thanks in advance!
>>>
>>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <sh...@gmail.com>
>>> wrote:
>>>
>>>> Our use case requires the tuples be processed in order across failures.
>>>>
>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
>>>> that aggregates data from B & C and writes to a database.
>>>>
>>>> We want to make sure that when we use tuple at a time processing OR use
>>>> the Trident API, the data always gets processed in the same order as it was
>>>> read by our spout. Given that between Bolt B & C there would be parallelism
>>>> and intermittent failures, my question is  the following -
>>>>
>>>> How does Storm guarantee processing order of tuples?
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>>>>
>>>
>>
>

Re: How does storm guarantee order of tuples processed?

Posted by Naresh Kosgi <na...@gmail.com>.

Also more information about why you need a certain order for processing
would help in recommending how to approach the problem

On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <na...@gmail.com> wrote:

> Storm as a framework does not guarantee order.  You will have to code it
> if you would like your tuples processed in certain order
>
> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <sh...@gmail.com>
> wrote:
>
>> Resending...
>>
>> Our use case requires the tuples be processed in order across failures.
>>
>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
>> that aggregates data from B & C and writes to a database.
>>
>> We want to make sure that when we use tuple at a time processing OR use
>> the Trident API, the data always gets processed in the same order as it was
>> read by our spout. Given that between Bolt B & C there would be parallelism
>> and intermittent failures, my question is  the following -
>>
>> How does Storm guarantee processing order of tuples?
>>
>> Thanks in advance!
>>
>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <sh...@gmail.com>
>> wrote:
>>
>>> Our use case requires the tuples be processed in order across failures.
>>>
>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
>>> that aggregates data from B & C and writes to a database.
>>>
>>> We want to make sure that when we use tuple at a time processing OR use
>>> the Trident API, the data always gets processed in the same order as it was
>>> read by our spout. Given that between Bolt B & C there would be parallelism
>>> and intermittent failures, my question is  the following -
>>>
>>> How does Storm guarantee processing order of tuples?
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>
>

Re: How does storm guarantee order of tuples processed?

Posted by Naresh Kosgi <na...@gmail.com>.

Storm as a framework does not guarantee order.  You will have to code it if
you would like your tuples processed in certain order

On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <sh...@gmail.com> wrote:

> Resending...
>
> Our use case requires the tuples be processed in order across failures.
>
> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
> that aggregates data from B & C and writes to a database.
>
> We want to make sure that when we use tuple at a time processing OR use
> the Trident API, the data always gets processed in the same order as it was
> read by our spout. Given that between Bolt B & C there would be parallelism
> and intermittent failures, my question is  the following -
>
> How does Storm guarantee processing order of tuples?
>
> Thanks in advance!
>
> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <sh...@gmail.com>
> wrote:
>
>> Our use case requires the tuples be processed in order across failures.
>>
>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
>> that aggregates data from B & C and writes to a database.
>>
>> We want to make sure that when we use tuple at a time processing OR use
>> the Trident API, the data always gets processed in the same order as it was
>> read by our spout. Given that between Bolt B & C there would be parallelism
>> and intermittent failures, my question is  the following -
>>
>> How does Storm guarantee processing order of tuples?
>>
>> Thanks in advance!
>>
>>
>>
>

Re: How does storm guarantee order of tuples processed?

Posted by Shawn Bonnin <sh...@gmail.com>.

Resending...

Our use case requires the tuples be processed in order across failures.

So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
that aggregates data from B & C and writes to a database.

We want to make sure that when we use tuple at a time processing OR use the
Trident API, the data always gets processed in the same order as it was
read by our spout. Given that between Bolt B & C there would be parallelism
and intermittent failures, my question is  the following -

How does Storm guarantee processing order of tuples?

Thanks in advance!

On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <sh...@gmail.com>
wrote:

> Our use case requires the tuples be processed in order across failures.
>
> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
> that aggregates data from B & C and writes to a database.
>
> We want to make sure that when we use tuple at a time processing OR use
> the Trident API, the data always gets processed in the same order as it was
> read by our spout. Given that between Bolt B & C there would be parallelism
> and intermittent failures, my question is  the following -
>
> How does Storm guarantee processing order of tuples?
>
> Thanks in advance!
>
>
>