You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Simon Cooper <si...@featurespace.co.uk> on 2014/08/19 16:48:41 UTC

What happens on a batch timeout?

When a batch times out, what happens to all the current in-flight tuples when the batch is replayed? Are they removed from the executor queues, or are they left in the queues, so they might be received by the executor as part of the replayed batch/next batch, if the executor is running behind?

SimonC

RE: What happens on a batch timeout?

Posted by Simon Cooper <si...@featurespace.co.uk>.
This isn’t a spout issue, this is a _topology_ issue. Specifically, what I believe is happening is tuples that are delayed in a topology, maybe in a queue or bolt somewhere, are being overtaken by batch start/end tuples, breaking any ordering constraints within a batch.

So if this is emitted (right to left), and it is a condition of the topology that A is always sent first:
xxxxxxxxxxA

If the batch times out, and a batch retry R is emitted, if some of the x tuples are delayed, the bolt receives this (again, RTL):
yyyyyyAxxxxxxxRxxxxA

Which breaks the condition that A is always the first tuple in a batch

From: Mayur Rustagi [mailto:mayur.rustagi@gmail.com]
Sent: 25 September 2014 11:59
To: user@storm.incubator.apache.org
Subject: Re: What happens on a batch timeout?

Seems to me, it depends on which spout you are using. If you are using Kafka & Transactional Spout then replay is consistent each time. In any other queue, batch may be different.
This contains the type of spouts & their limitations.
http://storm.incubator.apache.org/documentation/Trident-spouts.html


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi<https://twitter.com/mayur_rustagi>


On Thu, Sep 25, 2014 at 3:10 PM, Simon Cooper <si...@featurespace.co.uk>> wrote:
Does anyone have any information that could help with this? I’m baffled and don’t understand the behaviour we’re seeing – events are being received out of order on a batch replay, the only reason I can think is that tuples are left over from the previous batch in the input queues, but trying to use the batch id to filter tuples doesn’t seem to work.

Unfortunately, I can’t understand the behaviour without some input from someone who knows how trident works and can match this behaviour onto what trident is *meant* to do on a batch replay.

SimonC

From: Simon Cooper [mailto:simon.cooper@featurespace.co.uk<ma...@featurespace.co.uk>]
Sent: 19 August 2014 16:10
To: user@storm.incubator.apache.org<ma...@storm.incubator.apache.org>
Subject: RE: What happens on a batch timeout?

BTW, I’m referring to trident batches.

From: Simon Cooper [mailto:simon.cooper@featurespace.co.uk]
Sent: 19 August 2014 15:49
To: user@storm.incubator.apache.org<ma...@storm.incubator.apache.org>
Subject: What happens on a batch timeout?

When a batch times out, what happens to all the current in-flight tuples when the batch is replayed? Are they removed from the executor queues, or are they left in the queues, so they might be received by the executor as part of the replayed batch/next batch, if the executor is running behind?

SimonC


Re: What happens on a batch timeout?

Posted by Mayur Rustagi <ma...@gmail.com>.
Seems to me, it depends on which spout you are using. If you are using
Kafka & Transactional Spout then replay is consistent each time. In any
other queue, batch may be different.
This contains the type of spouts & their limitations.
http://storm.incubator.apache.org/documentation/Trident-spouts.html


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>


On Thu, Sep 25, 2014 at 3:10 PM, Simon Cooper <
simon.cooper@featurespace.co.uk> wrote:

>  Does anyone have any information that could help with this? I’m baffled
> and don’t understand the behaviour we’re seeing – events are being received
> out of order on a batch replay, the only reason I can think is that tuples
> are left over from the previous batch in the input queues, but trying to
> use the batch id to filter tuples doesn’t seem to work.
>
>
>
> Unfortunately, I can’t understand the behaviour without some input from
> someone who knows how trident works and can match this behaviour onto what
> trident is **meant** to do on a batch replay.
>
>
>
> SimonC
>
>
>
> *From:* Simon Cooper [mailto:simon.cooper@featurespace.co.uk]
> *Sent:* 19 August 2014 16:10
> *To:* user@storm.incubator.apache.org
> *Subject:* RE: What happens on a batch timeout?
>
>
>
> BTW, I’m referring to trident batches.
>
>
>
> *From:* Simon Cooper [mailto:simon.cooper@featurespace.co.uk
> <si...@featurespace.co.uk>]
> *Sent:* 19 August 2014 15:49
> *To:* user@storm.incubator.apache.org
> *Subject:* What happens on a batch timeout?
>
>
>
> When a batch times out, what happens to all the current in-flight tuples
> when the batch is replayed? Are they removed from the executor queues, or
> are they left in the queues, so they might be received by the executor as
> part of the replayed batch/next batch, if the executor is running behind?
>
>
>
> SimonC
>

RE: What happens on a batch timeout?

Posted by Simon Cooper <si...@featurespace.co.uk>.
Does anyone have any information that could help with this? I'm baffled and don't understand the behaviour we're seeing - events are being received out of order on a batch replay, the only reason I can think is that tuples are left over from the previous batch in the input queues, but trying to use the batch id to filter tuples doesn't seem to work.

Unfortunately, I can't understand the behaviour without some input from someone who knows how trident works and can match this behaviour onto what trident is *meant* to do on a batch replay.

SimonC

From: Simon Cooper [mailto:simon.cooper@featurespace.co.uk]
Sent: 19 August 2014 16:10
To: user@storm.incubator.apache.org
Subject: RE: What happens on a batch timeout?

BTW, I'm referring to trident batches.

From: Simon Cooper [mailto:simon.cooper@featurespace.co.uk]
Sent: 19 August 2014 15:49
To: user@storm.incubator.apache.org<ma...@storm.incubator.apache.org>
Subject: What happens on a batch timeout?

When a batch times out, what happens to all the current in-flight tuples when the batch is replayed? Are they removed from the executor queues, or are they left in the queues, so they might be received by the executor as part of the replayed batch/next batch, if the executor is running behind?

SimonC

RE: What happens on a batch timeout?

Posted by Simon Cooper <si...@featurespace.co.uk>.
BTW, I'm referring to trident batches.

From: Simon Cooper [mailto:simon.cooper@featurespace.co.uk]
Sent: 19 August 2014 15:49
To: user@storm.incubator.apache.org
Subject: What happens on a batch timeout?

When a batch times out, what happens to all the current in-flight tuples when the batch is replayed? Are they removed from the executor queues, or are they left in the queues, so they might be received by the executor as part of the replayed batch/next batch, if the executor is running behind?

SimonC