You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Simon Cooper <si...@featurespace.co.uk> on 2014/10/27 13:31:46 UTC

Explicitly failed tuples, tuple trees, and tuple ordering

I've got a couple of specific questions about tuple ordering and failing tuples. Given a topology like so - spout S outputting to both B1 and B2, B2 outputs to B3:

/> B1
S
\> B2 -> B3

If a tuple is emitted to both B1 and B2, and it is explicitly failed at B1 before the same tuple is processed by B2, will the tuple tree be entirely processed before the tuple is failed at the spout? In other words, is the following sequence of events possible?


1.       S emits T to B1 and B2

2.       B1 calls collector.fail(T)

3.       S.fail() called for T

4.       B2 processes T, emits T' anchored on T

5.       B3 processes T'

And on a somewhat related note, if a spout is outputting several streams, all going to the same bolt (spout S outputting streams St1 and St2 to bolt B), can tuples on different streams be received in a different order to the order they were emitted by the spout? If a spout emits the following tuples:

1.       T1-T100 to St1

2.       T101 to St2

Can a bolt subscribing to both St1 and St2 receive T101 before T100, or will the tuples be ordered according to the emit order at the spout? And will this apply throughout the topology?

Thanks,
SimonC

Re: Explicitly failed tuples, tuple trees, and tuple ordering

Posted by Sam Mati <sm...@appnexus.com>.
SimonC,

It wouldn't be too difficult to test these cases.

FWIW, I've found that when tuples timeout they are still processed by the entire topology, even though "fail()" is called on the spout.  My guess is that calling .fail() explicitly from either B1 or B2 will simply notify the Spout — if the Bolt emits, I would guess that it still gets sent to subscribers.

My intuition is that it would be very difficult to send failure notifications across the entire topology in a scalable way.  When B1 fails, it does not know what other Bolts may have that tuple queued up.

Let us know if you experiment and find any results — this is good stuff to know.

Best,
-Sam

From: Simon Cooper <si...@featurespace.co.uk>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, October 27, 2014 8:31 AM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Explicitly failed tuples, tuple trees, and tuple ordering

I’ve got a couple of specific questions about tuple ordering and failing tuples. Given a topology like so – spout S outputting to both B1 and B2, B2 outputs to B3:

/> B1
S
\> B2 -> B3

If a tuple is emitted to both B1 and B2, and it is explicitly failed at B1 before the same tuple is processed by B2, will the tuple tree be entirely processed before the tuple is failed at the spout? In other words, is the following sequence of events possible?


1.       S emits T to B1 and B2

2.       B1 calls collector.fail(T)

3.       S.fail() called for T

4.       B2 processes T, emits T’ anchored on T

5.       B3 processes T’

And on a somewhat related note, if a spout is outputting several streams, all going to the same bolt (spout S outputting streams St1 and St2 to bolt B), can tuples on different streams be received in a different order to the order they were emitted by the spout? If a spout emits the following tuples:

1.       T1-T100 to St1

2.       T101 to St2

Can a bolt subscribing to both St1 and St2 receive T101 before T100, or will the tuples be ordered according to the emit order at the spout? And will this apply throughout the topology?

Thanks,
SimonC