You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by 程训焘 <ch...@gmail.com> on 2014/01/21 13:43:15 UTC

Is the consistency guaranteed in Storm?

Hi, all,

The Storm guarantees no data loss and hence fully processing of all
incoming tuples by replaying those which are not fully processed or timed
out.

Every incoming tuple results in a tree of messages. Let's say one part of
the message tree is already fully processed but another part of the tree
failed. For example, we are counting words in "Hello World. How Are You."
What if the "Hello" is already counted while "How" is lost? Upon detection
of the failure, the storm will replay the tuple at the spout, right? So,
will this sentence be emitted again? Will this make "Hello" to be counted
twice?

Thanks!!!

Regards,
Cheng Xuntao

Re: Is the consistency guaranteed in Storm?

Posted by Richards Peter <hb...@gmail.com>.
Hi,

Storm provides exactly once processing semantics and atleast once
processing semantics. In atleast once processing semantics, storm will not
replay the failed tuples. It notifies the spout that there was a failure.
The failure notification will also include the messageId of the tuple which
got failed. The logic to replay the tuple should be provided by the user.

In exactly once processing semantics, storm can take care of replaying the
failed tuple. However the logic for counting the word correctly should be
introduced by the application developer.

Please explore trident topologies to know more about exactly once
processing semantics. Previously there was a concept called transactional
topology which provided exactly once processing semantics. However
transactional topologies is deprecated now.

Regards,
Richards Peter.