You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Sachin Pasalkar <Sa...@symantec.com> on 2015/09/12 13:41:50 UTC

How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Hi,

As per my knowledge  storm is follow "at least one” way , which means it will make sure at least once tuple gets fully processed. So my question is, if I have received some unexpected data, certain bolt in my topology will start them failing. The spout will get the failure notification from acker thread and will resend them. However as I know its always going to fail, is there any way I can ask spout to stop generation of spout after X number of attempts?

Thanks,
Sachin

RE: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Posted by Sachin Pasalkar <Sa...@symantec.com>.
Thanks for reply, that’s how we exactly implement using own unique key. I was just looking for any option provided by kafka spout to drop tuple after certain attempt. Don’t you think that will be wiser option, rather everyone code for it :).

Even for trident for sure tuple is going to fail in some bolt so trident will also try it continuously reply tuple unless one of them is processed. Is my observation correct? 

-----Original Message-----
From: Venkat Gmail [mailto:mvraoj@gmail.com] 
Sent: Saturday, September 12, 2015 10:28 PM
To: dev@storm.apache.org
Cc: Venkat Mantirraju
Subject: Re: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Not sure whether you received my answer. Hence resenting. 

Thanks,
Venkat

> On Sep 12, 2015, at 9:52 AM, Venkat Gmail <mv...@gmail.com> wrote:
> 
> Here are bunch of solutions: 
> Don't drop it but ack it always. 
> 
> 1) try use exactly once topology like Trident. 
> 2) if you want to use Regular storm topology which guarantees at least once (>=1): 
> A) override Kafka spout and handle fail differently Or
> B) at your bolt,
> Just store unique message id when you put in Kafka in to a store. These message ids can be stored in redis store. If message id already exists ( processed already), then ack it. Otherwise process it and ack it. In this way you will achieve 100% accuracy with regular storm topology without trident. For the message id: you can use already existing Kafka message id or create a unique guid for each message prior to putting in Kafka at the first time and use the same id. 
> Redis store: maintain a small foot print redis store to maintain a  ttl of 4 hours for each message id. I prefer 2nd one. 
> 
> Let me know if you have any questions and I will be glad to assist. 
> Thanks,
> Venkat
> 
>> On Sep 12, 2015, at 7:56 AM, Nathan Leung <nc...@gmail.com> wrote:
>> 
>> Don't fail the tuple, just drop it (don't emit). Btw the user list is 
>> better for this type of question.
>> On Sep 12, 2015 7:43 AM, "Sachin Pasalkar" 
>> <Sa...@symantec.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> As per my knowledge  storm is follow "at least one” way , which 
>>> means it will make sure at least once tuple gets fully processed. So 
>>> my question is, if I have received some unexpected data, certain 
>>> bolt in my topology will start them failing. The spout will get the 
>>> failure notification from acker thread and will resend them. However 
>>> as I know its always going to fail, is there any way I can ask spout 
>>> to stop generation of spout after X number of attempts?
>>> 
>>> Thanks,
>>> Sachin
>>> 

Re: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Posted by Venkat Gmail <mv...@gmail.com>.
Re sending I meant :)

Thanks,
Venkat

> On Sep 12, 2015, at 9:57 AM, Venkat Gmail <mv...@gmail.com> wrote:
> 
> Not sure whether you received my answer. Hence resenting. 
> 
> Thanks,
> Venkat
> 
>> On Sep 12, 2015, at 9:52 AM, Venkat Gmail <mv...@gmail.com> wrote:
>> 
>> Here are bunch of solutions: 
>> Don't drop it but ack it always. 
>> 
>> 1) try use exactly once topology like Trident. 
>> 2) if you want to use Regular storm topology which guarantees at least once (>=1): 
>> A) override Kafka spout and handle fail differently
>> Or 
>> B) at your bolt, 
>> Just store unique message id when you put in Kafka in to a store. These message ids can be stored in redis store. If message id already exists ( processed already), then ack it. Otherwise process it and ack it. In this way you will achieve 100% accuracy with regular storm topology without trident. For the message id: you can use already existing Kafka message id or create a unique guid for each message prior to putting in Kafka at the first time and use the same id. 
>> Redis store: maintain a small foot print redis store to maintain a  ttl of 4 hours for each message id. I prefer 2nd one. 
>> 
>> Let me know if you have any questions and I will be glad to assist. 
>> Thanks,
>> Venkat
>> 
>>> On Sep 12, 2015, at 7:56 AM, Nathan Leung <nc...@gmail.com> wrote:
>>> 
>>> Don't fail the tuple, just drop it (don't emit). Btw the user list is
>>> better for this type of question.
>>> On Sep 12, 2015 7:43 AM, "Sachin Pasalkar" <Sa...@symantec.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> As per my knowledge  storm is follow "at least one” way , which means it
>>>> will make sure at least once tuple gets fully processed. So my question is,
>>>> if I have received some unexpected data, certain bolt in my topology will
>>>> start them failing. The spout will get the failure notification from acker
>>>> thread and will resend them. However as I know its always going to fail, is
>>>> there any way I can ask spout to stop generation of spout after X number of
>>>> attempts?
>>>> 
>>>> Thanks,
>>>> Sachin
>>>> 

Re: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Posted by Venkat Gmail <mv...@gmail.com>.
Not sure whether you received my answer. Hence resenting. 

Thanks,
Venkat

> On Sep 12, 2015, at 9:52 AM, Venkat Gmail <mv...@gmail.com> wrote:
> 
> Here are bunch of solutions: 
> Don't drop it but ack it always. 
> 
> 1) try use exactly once topology like Trident. 
> 2) if you want to use Regular storm topology which guarantees at least once (>=1): 
> A) override Kafka spout and handle fail differently
> Or 
> B) at your bolt, 
> Just store unique message id when you put in Kafka in to a store. These message ids can be stored in redis store. If message id already exists ( processed already), then ack it. Otherwise process it and ack it. In this way you will achieve 100% accuracy with regular storm topology without trident. For the message id: you can use already existing Kafka message id or create a unique guid for each message prior to putting in Kafka at the first time and use the same id. 
> Redis store: maintain a small foot print redis store to maintain a  ttl of 4 hours for each message id. I prefer 2nd one. 
> 
> Let me know if you have any questions and I will be glad to assist. 
> Thanks,
> Venkat
> 
>> On Sep 12, 2015, at 7:56 AM, Nathan Leung <nc...@gmail.com> wrote:
>> 
>> Don't fail the tuple, just drop it (don't emit). Btw the user list is
>> better for this type of question.
>> On Sep 12, 2015 7:43 AM, "Sachin Pasalkar" <Sa...@symantec.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> As per my knowledge  storm is follow "at least one” way , which means it
>>> will make sure at least once tuple gets fully processed. So my question is,
>>> if I have received some unexpected data, certain bolt in my topology will
>>> start them failing. The spout will get the failure notification from acker
>>> thread and will resend them. However as I know its always going to fail, is
>>> there any way I can ask spout to stop generation of spout after X number of
>>> attempts?
>>> 
>>> Thanks,
>>> Sachin
>>> 

Re: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Posted by Venkat Gmail <mv...@gmail.com>.
Here are bunch of solutions: 
Don't drop it but ack it always. 

1) try use exactly once topology like Trident. 
2) if you want to use Regular storm topology which guarantees at least once (>=1): 
A) override Kafka spout and handle fail differently
Or 
B) at your bolt, 
Just store unique message id when you put in Kafka in to a store. These message ids can be stored in redis store. If message id already exists ( processed already), then ack it. Otherwise process it and ack it. In this way you will achieve 100% accuracy with regular storm topology without trident. For the message id: you can use already existing Kafka message id or create a unique guid for each message prior to putting in Kafka at the first time and use the same id. 
Redis store: maintain a small foot print redis store to maintain a  ttl of 4 hours for each message id. I prefer 2nd one. 

Let me know if you have any questions and I will be glad to assist. 
Thanks,
Venkat

> On Sep 12, 2015, at 7:56 AM, Nathan Leung <nc...@gmail.com> wrote:
> 
> Don't fail the tuple, just drop it (don't emit). Btw the user list is
> better for this type of question.
> On Sep 12, 2015 7:43 AM, "Sachin Pasalkar" <Sa...@symantec.com>
> wrote:
> 
>> Hi,
>> 
>> As per my knowledge  storm is follow "at least one” way , which means it
>> will make sure at least once tuple gets fully processed. So my question is,
>> if I have received some unexpected data, certain bolt in my topology will
>> start them failing. The spout will get the failure notification from acker
>> thread and will resend them. However as I know its always going to fail, is
>> there any way I can ask spout to stop generation of spout after X number of
>> attempts?
>> 
>> Thanks,
>> Sachin
>> 

Re: How to limit the spout not to generate the tuple in case of failure of downstream tuple?

Posted by Nathan Leung <nc...@gmail.com>.
Don't fail the tuple, just drop it (don't emit). Btw the user list is
better for this type of question.
On Sep 12, 2015 7:43 AM, "Sachin Pasalkar" <Sa...@symantec.com>
wrote:

> Hi,
>
> As per my knowledge  storm is follow "at least one” way , which means it
> will make sure at least once tuple gets fully processed. So my question is,
> if I have received some unexpected data, certain bolt in my topology will
> start them failing. The spout will get the failure notification from acker
> thread and will resend them. However as I know its always going to fail, is
> there any way I can ask spout to stop generation of spout after X number of
> attempts?
>
> Thanks,
> Sachin
>