You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Hefeng Yuan <hf...@rhapsody.com> on 2014/11/26 22:40:47 UTC

New kafka spout

Hello, 

I’m trying to us HolmesNL/kafka-spout, it worked pretty well for happy path, however, when tuple fails (e.g. _collector.fail(input) gets called in bolt), it seems like only retry 3 or 4 times, and then hang there, until the supervisor.worker.timeout.secs reaches, and topology got restarted.
Just wondering where is this number of retried controlled, and also, since the tuple already fail, why would it still trigger supervisor.worker.timeout.secs?

Thanks,
Hefeng

Re: New kafka spout

Posted by Hefeng Yuan <hf...@rhapsody.com>.

Figured out myself, it’s because I throw FailedException() after calling the _collector.fail(input), apparently, HolmesNL spout doesn’t handle FailedException, and the whole worker process halted, as soon as I remove that exception, retry works well

> On Nov 26, 2014, at 3:00 PM, Hefeng Yuan <hf...@rhapsody.com> wrote:
> 
> Thanks for the reply, yeah that explains the auto restart part, still not clear why it retries 4 times and stop
> 
> I did start with the official Kafka spout, totally doesn't work for me, loses message, and constantly restart worker with timed-out
> 
> Are there someone else also using HolmesNL spout? Wondering how you guys deal with failed tuple retry
> 
> 
> 
> On Nov 26, 2014, at 13:59, Harsha <storm@harsha.io <ma...@harsha.io>> wrote:
> 
>>  
>> If your bolt hanged it will cause workers not to send heartbeats and supervisor.worker.timeout.secs trigger causing workers to be killed and restarted. Did you try using https://github.com/apache/storm/tree/master/external/storm-kafka <https://github.com/apache/storm/tree/master/external/storm-kafka> 
>> -Harsha
>>  
>> On Wed, Nov 26, 2014, at 01:40 PM, Hefeng Yuan wrote:
>>> Hello, 
>>>  
>>> I’m trying to us HolmesNL/kafka-spout, it worked pretty well for happy path, however, when tuple fails (e.g. _collector.fail(input) gets called in bolt), it seems like only retry 3 or 4 times, and then hang there, until the supervisor.worker.timeout.secs reaches, and topology got restarted.
>>> Just wondering where is this number of retried controlled, and also, since the tuple already fail, why would it still trigger supervisor.worker.timeout.secs?
>>>  
>>> Thanks,
>>> Hefeng
>>

Re: New kafka spout

Posted by Hefeng Yuan <hf...@rhapsody.com>.

Thanks for the reply, yeah that explains the auto restart part, still not clear why it retries 4 times and stop

I did start with the official Kafka spout, totally doesn't work for me, loses message, and constantly restart worker with timed-out

Are there someone else also using HolmesNL spout? Wondering how you guys deal with failed tuple retry



> On Nov 26, 2014, at 13:59, Harsha <st...@harsha.io> wrote:
> 
>  
> If your bolt hanged it will cause workers not to send heartbeats and supervisor.worker.timeout.secs trigger causing workers to be killed and restarted. Did you try using https://github.com/apache/storm/tree/master/external/storm-kafka 
> -Harsha
>  
>> On Wed, Nov 26, 2014, at 01:40 PM, Hefeng Yuan wrote:
>> Hello, 
>>  
>> I’m trying to us HolmesNL/kafka-spout, it worked pretty well for happy path, however, when tuple fails (e.g. _collector.fail(input) gets called in bolt), it seems like only retry 3 or 4 times, and then hang there, until the supervisor.worker.timeout.secs reaches, and topology got restarted.
>> Just wondering where is this number of retried controlled, and also, since the tuple already fail, why would it still trigger supervisor.worker.timeout.secs?
>>  
>> Thanks,
>> Hefeng
>

Re: New kafka spout

Posted by Harsha <st...@harsha.io>.

If your bolt hanged it will cause workers not to send heartbeats and
supervisor.worker.timeout.secs trigger causing workers to be killed and
restarted. Did you try using
https://github.com/apache/storm/tree/master/external/storm-kafka -Harsha

On Wed, Nov 26, 2014, at 01:40 PM, Hefeng Yuan wrote:
> Hello,
>
> I’m trying to us HolmesNL/kafka-spout, it worked pretty well for happy
> path, however, when tuple fails (e.g.* _collector.fail(input) *gets
> called in bolt), it seems like only retry 3 or 4 times, and then hang
> there, until the *supervisor.worker.timeout.secs* reaches, and
> topology got restarted. Just wondering where is this number of retried
> controlled, and also, since the tuple already *fail*, why would it
> still trigger *supervisor.worker.timeout.secs*?
>
> Thanks, Hefeng