You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Nilesh Chhapru <ni...@ugamsolutions.com> on 2014/11/03 13:21:02 UTC

Message Flows Even After Timeout

Hi All,

I have a (Storm Kafka Spout) Kafka spout which emit to a bolts, post some computation over the message it is passed back to other Kafka topic using a Kafka Producer.

In my application even if the message is gets timeout it still loads the message to the producer, which results duplicate entries in the resulting spout.

Is there a way to kill the message if it times out, as for me storm isn't killing the message even when the spout failed method is called.

Kindly let know if I am missing out some configuration here.

Regards,
Nilesh Chhapru.


________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------

****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****

Re: Message Flows, Even After Timeout

Posted by Sam Mati <sm...@appnexus.com>.
@Vladi,

It seems simple to me.

Storm could send a "spout-timestamp" with each tuple.

Then, when a timeout occurs, Storm could notify all workers that any tuple with a "spout-timestamp" < XYZ should be discarded (or removed from the input buffer).  At worst, each component contains some small list of invalid timestamps to check against before calling .execute().

-Sam

From: Vladi Feigin <vl...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Wednesday, November 5, 2014 3:09 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Message Flows, Even After Timeout

@Sam.
It's not so simple to recognize if it's a network outage or it's slowly running insert to db
(in the storm core)
V

On Wed, Nov 5, 2014 at 9:38 PM, Sam Mati <sm...@appnexus.com>> wrote:
This is intended behavior of Storm (which I strongly disagree with).  "timeouts" are intended to fail any tuples that have been lost due to workers going down, network issues, etc… not as a means of controlling flow.

You should set max pending lower, set the timeout higher, and/or have your Bolts ignore "old" tuples by adding a timestamp to the original tuple and having the "execute" method of Bolts test that the tuple is not too old.

Other threads that address this:
"KafkaSpout stops pulling data after a few hours"
"What is the purpose of timing out tuples?"

Best,
-Sam

From: Nilesh Chhapru <ni...@ugamsolutions.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Wednesday, November 5, 2014 1:13 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Message Flows, Even After Timeout

Hi All,

I have a (Storm Kafka Spout) Kafka spout which emit to a bolts, post some computation over the message it is passed back to other Kafka topic using a Kafka Producer.

In my application even if the message is gets timeout it still loads the message to the producer, which results duplicate entries in the resulting spout.

Is there a way to kill the message if it times out, as for me storm isn’t killing the message even when the spout failed method is called.

Kindly let know if I am missing out some configuration here.

Regards,
Nilesh Chhapru.

________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------

****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****


Re: Message Flows, Even After Timeout

Posted by Vladi Feigin <vl...@gmail.com>.
@Sam.
It's not so simple to recognize if it's a network outage or it's slowly
running insert to db
(in the storm core)
V

On Wed, Nov 5, 2014 at 9:38 PM, Sam Mati <sm...@appnexus.com> wrote:

>  This is intended behavior of Storm (which I strongly disagree with).
>  "timeouts" are intended to fail any tuples that have been lost due to
> workers going down, network issues, etc… not as a means of controlling flow.
>
>  You should set max pending lower, set the timeout higher, and/or have
> your Bolts ignore "old" tuples by adding a timestamp to the original tuple
> and having the "execute" method of Bolts test that the tuple is not too old.
>
>  Other threads that address this:
> "KafkaSpout stops pulling data after a few hours"
> "What is the purpose of timing out tuples?"
>
>  Best,
> -Sam
>
>   From: Nilesh Chhapru <ni...@ugamsolutions.com>
> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
> Date: Wednesday, November 5, 2014 1:13 PM
> To: "user@storm.apache.org" <us...@storm.apache.org>
> Subject: Message Flows, Even After Timeout
>
>   Hi All,
>
>
>
> I have a (Storm Kafka Spout) Kafka spout which emit to a bolts, post some
> computation over the message it is passed back to other Kafka topic using a
> Kafka Producer.
>
>
>
> In my application even if the message is gets timeout it still loads the
> message to the producer, which results duplicate entries in the resulting
> spout.
>
>
>
> Is there a way to kill the message if it times out, as for me storm isn’t
> killing the message even when the spout failed method is called.
>
>
>
> Kindly let know if I am missing out some configuration here.
>
>
>
> *Regards*,
>
> *Nilesh Chhapru.*
>
> ------------------------------
>
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
> ****Opinions expressed in this e-mail are those of the author and do not
> necessarily represent those of Ugam. Ugam does not accept any
> responsibility or liability for it. This e-mail message may contain
> proprietary, confidential or legally privileged information for the sole
> use of the person or entity to whom this message was originally addressed.
> Any review, re-transmission, dissemination or other use of or taking of any
> action in reliance upon this information by persons or entities other than
> the intended recipient is prohibited. If you have received this e-mail in
> error, please delete it and all attachments from any servers, hard drives
> or any other media.
>
> Warning: Sufficient measures have been taken to scan any presence of
> viruses however the recipient should check this email and any attachments
> for the presence of viruses. Ugam accepts no liability for any damage
> caused by any virus transmitted by this email. ****
>

Re: Message Flows, Even After Timeout

Posted by Sam Mati <sm...@appnexus.com>.
This is intended behavior of Storm (which I strongly disagree with).  "timeouts" are intended to fail any tuples that have been lost due to workers going down, network issues, etc… not as a means of controlling flow.

You should set max pending lower, set the timeout higher, and/or have your Bolts ignore "old" tuples by adding a timestamp to the original tuple and having the "execute" method of Bolts test that the tuple is not too old.

Other threads that address this:
"KafkaSpout stops pulling data after a few hours"
"What is the purpose of timing out tuples?"

Best,
-Sam

From: Nilesh Chhapru <ni...@ugamsolutions.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Wednesday, November 5, 2014 1:13 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Message Flows, Even After Timeout

Hi All,

I have a (Storm Kafka Spout) Kafka spout which emit to a bolts, post some computation over the message it is passed back to other Kafka topic using a Kafka Producer.

In my application even if the message is gets timeout it still loads the message to the producer, which results duplicate entries in the resulting spout.

Is there a way to kill the message if it times out, as for me storm isn’t killing the message even when the spout failed method is called.

Kindly let know if I am missing out some configuration here.

Regards,
Nilesh Chhapru.

________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------

****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****

Re: Message Flows, Even After Timeout

Posted by Vladi Feigin <vl...@gmail.com>.
Hi ,
Take a look this thread :
https://mail.google.com/mail/u/1/#inbox/14967657989b14e4
You should have optimal configuration . In case you have many timeouts
check your max spout pending and message timeout

Vladi

On Wed, Nov 5, 2014 at 8:13 PM, Nilesh Chhapru <
nilesh.chhapru@ugamsolutions.com> wrote:

>  Hi All,
>
>
>
> I have a (Storm Kafka Spout) Kafka spout which emit to a bolts, post some
> computation over the message it is passed back to other Kafka topic using a
> Kafka Producer.
>
>
>
> In my application even if the message is gets timeout it still loads the
> message to the producer, which results duplicate entries in the resulting
> spout.
>
>
>
> Is there a way to kill the message if it times out, as for me storm isn’t
> killing the message even when the spout failed method is called.
>
>
>
> Kindly let know if I am missing out some configuration here.
>
>
>
> *Regards*,
>
> *Nilesh Chhapru.*
>
> ------------------------------
>
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
> ****Opinions expressed in this e-mail are those of the author and do not
> necessarily represent those of Ugam. Ugam does not accept any
> responsibility or liability for it. This e-mail message may contain
> proprietary, confidential or legally privileged information for the sole
> use of the person or entity to whom this message was originally addressed.
> Any review, re-transmission, dissemination or other use of or taking of any
> action in reliance upon this information by persons or entities other than
> the intended recipient is prohibited. If you have received this e-mail in
> error, please delete it and all attachments from any servers, hard drives
> or any other media.
>
> Warning: Sufficient measures have been taken to scan any presence of
> viruses however the recipient should check this email and any attachments
> for the presence of viruses. Ugam accepts no liability for any damage
> caused by any virus transmitted by this email. ****
>

Message Flows, Even After Timeout

Posted by Nilesh Chhapru <ni...@ugamsolutions.com>.
Hi All,

I have a (Storm Kafka Spout) Kafka spout which emit to a bolts, post some computation over the message it is passed back to other Kafka topic using a Kafka Producer.

In my application even if the message is gets timeout it still loads the message to the producer, which results duplicate entries in the resulting spout.

Is there a way to kill the message if it times out, as for me storm isn't killing the message even when the spout failed method is called.

Kindly let know if I am missing out some configuration here.

Regards,
Nilesh Chhapru.

________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------

****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****