You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Giancarlo Pagano <Gi...@beamly.com> on 2016/05/16 12:30:48 UTC

Emit order in Kafka spout

Hello,

I’m trying to understand what kind of ordering guarantee is expected from the Kafka spout in case of failure.
I’m using Storm 0.9.x, configuring the spout as described here http://storm.apache.org/releases/0.9.6/storm-kafka.html, with ZkHosts and only changing startOffsetTime to be LatestTime. The rest of the config is not modified and kept as default.
I’m depending on:
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>

Searching the mailing list I’ve found this in a previous message, http://mail-archives.apache.org/mod_mbox/storm-user/201501.mbox/%3CCAOv%2BhsQNYp1BOACHnvzSS%2BSwS%2BKXvb8Q-7FiTWCiWqFEU0-h%2Bw%40mail.gmail.com%3E:

Suppose for example your spout emits tuples A, B, C, D, E and tuple C fails.[…] KafkaSpout, on the other hand, would also re-emit all tuples after the failed tuple. So it would re-emit C, D, and E, even if D and E were successfully processed.

However, I haven’t be able to reproduce this behavior in my tests. After a failure, only the failed tuple is re-emitted. In the above example, only C is re-emitted, not D and E.  All the tuples are in the same kafka partition.
Am I missing some config to enable this behavior or maybe is there a different implementation of the kafka spout that supports this?

Thanks,
Giancarlo

Re: Emit order in Kafka spout

Posted by Abhishek Agarwal <ab...@gmail.com>.
The observation may be based upon older version of storm-kafka module. The
current storm-kafka module re-send only the messages which failed.

On Mon, May 16, 2016 at 6:00 PM, Giancarlo Pagano <Gi...@beamly.com>
wrote:

> Hello,
>
> I’m trying to understand what kind of ordering guarantee is expected from
> the Kafka spout in case of failure.
> I’m using Storm 0.9.x, configuring the spout as described here
> http://storm.apache.org/releases/0.9.6/storm-kafka.html, with ZkHosts and
> only changing startOffsetTime to be LatestTime. The rest of the config is
> not modified and kept as default.
> I’m depending on:
> <groupId>org.apache.storm</groupId>
> <artifactId>storm-kafka</artifactId>
>
> Searching the mailing list I’ve found this in a previous message,
> http://mail-archives.apache.org/mod_mbox/storm-user/201501.mbox/%3CCAOv%2BhsQNYp1BOACHnvzSS%2BSwS%2BKXvb8Q-7FiTWCiWqFEU0-h%2Bw%40mail.gmail.com%3E
> :
>
> Suppose for example your spout emits tuples A, B, C, D, E and tuple C fails.[…] KafkaSpout, on the other hand, would also re-emit all tuples after the failed tuple. So it would re-emit C, D, and E, even if D and E were successfully processed.
>
> However, I haven’t be able to reproduce this behavior in my tests. After a
> failure, only the failed tuple is re-emitted. In the above example, only C
> is re-emitted, not D and E.  All the tuples are in the same kafka partition.
> Am I missing some config to enable this behavior or maybe is there a
> different implementation of the kafka spout that supports this?
>
> Thanks,
> Giancarlo
>



-- 
Regards,
Abhishek Agarwal