You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Simon Cooper <si...@featurespace.co.uk> on 2014/10/13 13:58:17 UTC

Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I'm having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout :(

Thanks,
SimonC

Re: Finding out why a tuple failed

Posted by Reem Bensimhon <re...@forter.com>.
Hey Vladi

The Spout.fail() is called whenever a tuple in your topology fails (including timeouts), so that means that if you see failures they are surely in a part of your topology that is anchored (its acks/fails count).

Analyzing the stack for those failures would let you know if they failed due to timeout or not, but it wouldn’t reveal the identity of the bolt that failed to ack.

In our topology we have a uniform wrapper for all the bolts, and for each bolt we wrap the OutputCollector. That control over all Bolts and Collectors had allowed me (in extreme cases where I have no idea who’s the culprit for timeouts) to log all the topology’s bolt.execute and collector.ack calls. Logging all these calls is a sure way of finding the issue. In cases of timeouts you would see more execute than ack calls in the problematic bolt.

On Oct 21, 2014, at 8:07 AM, Vladi Feigin <vl...@gmail.com>> wrote:

Hi Tyson,
This implementation with ISpout.fail() stack trace is possible only if a topology runs with acks mode , isn't it?
Vladi

On Mon, Oct 20, 2014 at 7:35 PM, Tyson Norris <tn...@adobe.com>> wrote:
We had the same problem - failures, but no explanation.

Ignoring the root cause for a moment, one thing we did was to simply determine where ISpout.fail() is getting called from is to generate an Exception and print the stack trace.
E.g.
        Exception ex = new Exception("stacktrace for logging fail() invocation");
        log.info<http://log.info/>("failing msgId {}", msgId, ex);

Within the stack trace, we were able to see that all failures were due to timeouts. This should generally be the case if you are not getting any exceptions at the worker, and this is one way to prove exactly the reason it is getting failed.

Obviously as others pointed out, if you are dealing with FailedException in bolts, you can throw ReportFailedException instead to get it reported back to ui, but in the case of timeout (or any other case that is causing ISpout.fail() ) then a stack trace can be a simple way to track how this happened.

It would be nice to be able to generically (not customized in the spout) trace from messageId to what state the tuple is in (waiting for ack, timed-out, etc) from within the spout, but I’m not sure the best way to do that.

Tyson


On Oct 20, 2014, at 8:36 AM, Simon Cooper <si...@featurespace.co.uk>> wrote:

That’s exactly the problem – our IRichBolts are quite complex, and keep hold of multiple tuples waiting for other ‘trigger’ tuples before acking several at once. With many thousands of tuples flying around the topology, it’s very hard to debug issues when one tuple randomly fails – which bolt was holding it waiting for a trigger and didn’t ack it in time? Or, if the tuple was failed manually, which bolt failed it?

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 14:43
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

Simon - Take a look at  BasicBoltExecutor#executor which is an adaptor from IBasicBolt to IRichBolt.  All collector.fail is accompanied with collector.reportError() if you rethrow exception as ReportedFailedException.

Could you please check that this is the case in your bolts too ?
In IRichBolt you would need to take care of that yourself.​

________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Tuesday, October 14, 2014 12:48 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: RE: Finding out why a tuple failed

We’re seeing random failures. No exceptions in the logs, just failed tuples at the spout with no other information. We think it’s timeouts, but there’s no information anywhere as to which bolts in the tuple tree didn’t ack or fail the event in time.

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 08:32
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

​Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple.

Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id.


________________________________
From: Vladi Feigin <vl...@gmail.com>>
Sent: Monday, October 13, 2014 8:50 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

@Itai
What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com>> wrote:
Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.

To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.

Itai

________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I’m having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout ☹

Thanks,
SimonC





Re: Finding out why a tuple failed

Posted by Vladi Feigin <vl...@gmail.com>.
Hi Tyson,
This implementation with ISpout.fail() stack trace is possible only if a
topology runs with acks mode , isn't it?
Vladi

On Mon, Oct 20, 2014 at 7:35 PM, Tyson Norris <tn...@adobe.com> wrote:

>  We had the same problem - failures, but no explanation.
>
>  Ignoring the root cause for a moment, one thing we did was to simply
> determine where ISpout.fail() is getting called from is to generate an
> Exception and print the stack trace.
> E.g.
>         Exception ex = new Exception("stacktrace for logging fail()
> invocation");
>         log.info("failing msgId {}", msgId, ex);
>
>  Within the stack trace, we were able to see that all failures were due
> to timeouts. This should generally be the case if you are not getting any
> exceptions at the worker, and this is one way to prove exactly the reason
> it is getting failed.
>
>  Obviously as others pointed out, if you are dealing with FailedException
> in bolts, you can throw ReportFailedException instead to get it reported
> back to ui, but in the case of timeout (or any other case that is causing
> ISpout.fail() ) then a stack trace can be a simple way to track how this
> happened.
>
>  It would be nice to be able to generically (not customized in the spout)
> trace from messageId to what state the tuple is in (waiting for ack,
> timed-out, etc) from within the spout, but I’m not sure the best way to do
> that.
>
>  Tyson
>
>
>  On Oct 20, 2014, at 8:36 AM, Simon Cooper <
> simon.cooper@featurespace.co.uk> wrote:
>
>   That’s exactly the problem – our IRichBolts are quite complex, and keep
> hold of multiple tuples waiting for other ‘trigger’ tuples before acking
> several at once. With many thousands of tuples flying around the topology,
> it’s very hard to debug issues when one tuple randomly fails – which bolt
> was holding it waiting for a trigger and didn’t ack it in time? Or, if the
> tuple was failed manually, which bolt failed it?
>
>   *From:* Itai Frenkel [mailto:Itai@forter.com <It...@forter.com>]
> *Sent:* 14 October 2014 14:43
> *To:* user@storm.apache.org
> *Subject:* Re: Finding out why a tuple failed
>
>  Simon - Take a look at  BasicBoltExecutor#executor which is an adaptor
> from IBasicBolt to IRichBolt.  All collector.fail is accompanied with
> collector.reportError() if you rethrow exception as ReportedFailedException.
>
>  Could you please check that this is the case in your bolts too ?
>  In IRichBolt you would need to take care of that yourself.​
>
>   ------------------------------
>   *From:* Simon Cooper <si...@featurespace.co.uk>
> *Sent:* Tuesday, October 14, 2014 12:48 PM
> *To:* user@storm.apache.org
> *Subject:* RE: Finding out why a tuple failed
>
>   We’re seeing random failures. No exceptions in the logs, just failed
> tuples at the spout with no other information. We think it’s timeouts, but
> there’s no information anywhere as to which bolts in the tuple tree didn’t
> ack or fail the event in time.
>
>   *From:* Itai Frenkel [mailto:Itai@forter.com <It...@forter.com>]
> *Sent:* 14 October 2014 08:32
> *To:* user@storm.apache.org
> *Subject:* Re: Finding out why a tuple failed
>
>  ​Let's say you have 10000 tuples processed. And only one of them
> reported an error and that is the same tuple that failed. They you look in
> Sigmund and see the error and you know for sure it relates to the failed
> tuple.
>
>  Now let's consider that out of 10000, half of them failed for different
> reasons, then looking in sigmund will still give you errors, however you
> would not be able to pinpoint it to a specific tuple id.
>
>
>   ------------------------------
>   *From:* Vladi Feigin <vl...@gmail.com>
> *Sent:* Monday, October 13, 2014 8:50 PM
> *To:* user@storm.apache.org
> *Subject:* Re: Finding out why a tuple failed
>
>    @Itai
>  What do you mean by "other errors" ? Are these the internal Storm errors
> ,which are not reported in the nimbus?
>   If yes, are they reported in the logs?
>   Vladi
>
>
>  On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com> wrote:
>  Assuming each failure in the code is accompanied by
> collector.reportError(ex) (aka BasicBolt) then you would see an exception
> in nimbus. If there are many other errors, then it may not be the exception
> you are looking for.
>
>  To get more fidelity you would need to send all errors to ELK stack
> (that's what we do) and filter by id.
>
>  Itai
>
>   ------------------------------
>   *From:* Simon Cooper <si...@featurespace.co.uk>
> *Sent:* Monday, October 13, 2014 2:58 PM
> *To:* user@storm.apache.org
> *Subject:* Finding out why a tuple failed
>
>   Is there *any* possible way, either through logging or
> programmatically, to find out why a tuple failed? If it timed out, which
> bolts it was waiting for acks from in the tuple tree, and if it was
> explicitly failed, which bolt failed it? I’m having a hell of a time trying
> to debug a complex topology that is not acking any of its tuples back at
> the spout L
>
>  Thanks,
>  SimonC
>
>
>
>

Re: Finding out why a tuple failed

Posted by Tyson Norris <tn...@adobe.com>.
We had the same problem - failures, but no explanation.

Ignoring the root cause for a moment, one thing we did was to simply determine where ISpout.fail() is getting called from is to generate an Exception and print the stack trace.
E.g.
        Exception ex = new Exception("stacktrace for logging fail() invocation");
        log.info("failing msgId {}", msgId, ex);

Within the stack trace, we were able to see that all failures were due to timeouts. This should generally be the case if you are not getting any exceptions at the worker, and this is one way to prove exactly the reason it is getting failed.

Obviously as others pointed out, if you are dealing with FailedException in bolts, you can throw ReportFailedException instead to get it reported back to ui, but in the case of timeout (or any other case that is causing ISpout.fail() ) then a stack trace can be a simple way to track how this happened.

It would be nice to be able to generically (not customized in the spout) trace from messageId to what state the tuple is in (waiting for ack, timed-out, etc) from within the spout, but I’m not sure the best way to do that.

Tyson


On Oct 20, 2014, at 8:36 AM, Simon Cooper <si...@featurespace.co.uk>> wrote:

That’s exactly the problem – our IRichBolts are quite complex, and keep hold of multiple tuples waiting for other ‘trigger’ tuples before acking several at once. With many thousands of tuples flying around the topology, it’s very hard to debug issues when one tuple randomly fails – which bolt was holding it waiting for a trigger and didn’t ack it in time? Or, if the tuple was failed manually, which bolt failed it?

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 14:43
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

Simon - Take a look at  BasicBoltExecutor#executor which is an adaptor from IBasicBolt to IRichBolt.  All collector.fail is accompanied with collector.reportError() if you rethrow exception as ReportedFailedException.

Could you please check that this is the case in your bolts too ?
In IRichBolt you would need to take care of that yourself.​

________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Tuesday, October 14, 2014 12:48 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: RE: Finding out why a tuple failed

We’re seeing random failures. No exceptions in the logs, just failed tuples at the spout with no other information. We think it’s timeouts, but there’s no information anywhere as to which bolts in the tuple tree didn’t ack or fail the event in time.

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 08:32
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

​Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple.

Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id.


________________________________
From: Vladi Feigin <vl...@gmail.com>>
Sent: Monday, October 13, 2014 8:50 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

@Itai
What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com>> wrote:
Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.

To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.

Itai

________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I’m having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout ☹

Thanks,
SimonC



RE: Finding out why a tuple failed

Posted by Simon Cooper <si...@featurespace.co.uk>.
That’s exactly the problem – our IRichBolts are quite complex, and keep hold of multiple tuples waiting for other ‘trigger’ tuples before acking several at once. With many thousands of tuples flying around the topology, it’s very hard to debug issues when one tuple randomly fails – which bolt was holding it waiting for a trigger and didn’t ack it in time? Or, if the tuple was failed manually, which bolt failed it?

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 14:43
To: user@storm.apache.org
Subject: Re: Finding out why a tuple failed


Simon - Take a look at  BasicBoltExecutor#executor which is an adaptor from IBasicBolt to IRichBolt.  All collector.fail is accompanied with collector.reportError() if you rethrow exception as ReportedFailedException.



Could you please check that this is the case in your bolts too ?

In IRichBolt you would need to take care of that yourself.​



________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Tuesday, October 14, 2014 12:48 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: RE: Finding out why a tuple failed

We’re seeing random failures. No exceptions in the logs, just failed tuples at the spout with no other information. We think it’s timeouts, but there’s no information anywhere as to which bolts in the tuple tree didn’t ack or fail the event in time.

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 08:32
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed


​Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple.



Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id.





________________________________
From: Vladi Feigin <vl...@gmail.com>>
Sent: Monday, October 13, 2014 8:50 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

@Itai
What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com>> wrote:

Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.



To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.



Itai



________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I’m having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout ☹

Thanks,
SimonC


Re: Finding out why a tuple failed

Posted by Itai Frenkel <It...@forter.com>.
?Sigmund --> Nimbus

________________________________
From: Itai Frenkel
Sent: Tuesday, October 14, 2014 4:42 PM
To: user@storm.apache.org
Subject: Re: Finding out why a tuple failed


Simon - Take a look at  BasicBoltExecutor#executor which is an adaptor from IBasicBolt to IRichBolt.  All collector.fail is accompanied with collector.reportError() if you rethrow exception as ReportedFailedException.


Could you please check that this is the case in your bolts too ?

In IRichBolt you would need to take care of that yourself.?


________________________________
From: Simon Cooper <si...@featurespace.co.uk>
Sent: Tuesday, October 14, 2014 12:48 PM
To: user@storm.apache.org
Subject: RE: Finding out why a tuple failed

We're seeing random failures. No exceptions in the logs, just failed tuples at the spout with no other information. We think it's timeouts, but there's no information anywhere as to which bolts in the tuple tree didn't ack or fail the event in time.

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 08:32
To: user@storm.apache.org
Subject: Re: Finding out why a tuple failed


?Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple.



Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id.





________________________________
From: Vladi Feigin <vl...@gmail.com>>
Sent: Monday, October 13, 2014 8:50 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

@Itai
What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com>> wrote:

Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.



To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.



Itai



________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I'm having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout :(

Thanks,
SimonC


Re: Finding out why a tuple failed

Posted by Itai Frenkel <It...@forter.com>.
Simon - Take a look at  BasicBoltExecutor#executor which is an adaptor from IBasicBolt to IRichBolt.  All collector.fail is accompanied with collector.reportError() if you rethrow exception as ReportedFailedException.


Could you please check that this is the case in your bolts too ?

In IRichBolt you would need to take care of that yourself.?


________________________________
From: Simon Cooper <si...@featurespace.co.uk>
Sent: Tuesday, October 14, 2014 12:48 PM
To: user@storm.apache.org
Subject: RE: Finding out why a tuple failed

We're seeing random failures. No exceptions in the logs, just failed tuples at the spout with no other information. We think it's timeouts, but there's no information anywhere as to which bolts in the tuple tree didn't ack or fail the event in time.

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 08:32
To: user@storm.apache.org
Subject: Re: Finding out why a tuple failed


?Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple.



Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id.





________________________________
From: Vladi Feigin <vl...@gmail.com>>
Sent: Monday, October 13, 2014 8:50 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

@Itai
What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com>> wrote:

Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.



To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.



Itai



________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I'm having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout :(

Thanks,
SimonC


RE: Finding out why a tuple failed

Posted by Simon Cooper <si...@featurespace.co.uk>.
We’re seeing random failures. No exceptions in the logs, just failed tuples at the spout with no other information. We think it’s timeouts, but there’s no information anywhere as to which bolts in the tuple tree didn’t ack or fail the event in time.

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: 14 October 2014 08:32
To: user@storm.apache.org
Subject: Re: Finding out why a tuple failed


​Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple.



Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id.





________________________________
From: Vladi Feigin <vl...@gmail.com>>
Sent: Monday, October 13, 2014 8:50 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Finding out why a tuple failed

@Itai
What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com>> wrote:

Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.



To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.



Itai



________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I’m having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout ☹

Thanks,
SimonC


Re: Finding out why a tuple failed

Posted by Itai Frenkel <It...@forter.com>.
?Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple.


Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id.



________________________________
From: Vladi Feigin <vl...@gmail.com>
Sent: Monday, October 13, 2014 8:50 PM
To: user@storm.apache.org
Subject: Re: Finding out why a tuple failed

@Itai
What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com>> wrote:

Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.


To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.


Itai


________________________________
From: Simon Cooper <si...@featurespace.co.uk>>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I'm having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout :(

Thanks,
SimonC


Re: Finding out why a tuple failed

Posted by Vladi Feigin <vl...@gmail.com>.
@Itai
What do you mean by "other errors" ? Are these the internal Storm errors
,which are not reported in the nimbus?
If yes, are they reported in the logs?
Vladi


On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <It...@forter.com> wrote:

>  Assuming each failure in the code is accompanied by
> collector.reportError(ex) (aka BasicBolt) then you would see an exception
> in nimbus. If there are many other errors, then it may not be the exception
> you are looking for.
>
>
>  To get more fidelity you would need to send all errors to ELK stack
> (that's what we do) and filter by id.
>
>
>  Itai
>
>
>  ------------------------------
> *From:* Simon Cooper <si...@featurespace.co.uk>
> *Sent:* Monday, October 13, 2014 2:58 PM
> *To:* user@storm.apache.org
> *Subject:* Finding out why a tuple failed
>
>
> Is there *any* possible way, either through logging or programmatically,
> to find out why a tuple failed? If it timed out, which bolts it was waiting
> for acks from in the tuple tree, and if it was explicitly failed, which
> bolt failed it? I’m having a hell of a time trying to debug a complex
> topology that is not acking any of its tuples back at the spout L
>
>
>
> Thanks,
>
> SimonC
>

Re: Finding out why a tuple failed

Posted by Itai Frenkel <It...@forter.com>.
Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for.


To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id.


Itai


________________________________
From: Simon Cooper <si...@featurespace.co.uk>
Sent: Monday, October 13, 2014 2:58 PM
To: user@storm.apache.org
Subject: Finding out why a tuple failed

Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I'm having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout :(

Thanks,
SimonC