You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Jon Logan <jm...@buffalo.edu> on 2013/12/09 22:20:41 UTC

Tracking of Failed Workers/Bolts

I'm not sure if this has been discussed before, but I think it'd be great
if there were a way to keep track of failed workers/bolts. Occasionally, a
bolt will restart, for various reasons (generally either timing out or a
NPE inside of a bolt), and it is rather difficult to actually locate what
happened, and why. Part of this is because as soon as it fails, it
restarts, and you no longer know where the failure occurred (what
worker/port).

I think the UI should be augmented to show failed bolts, along with the
reason it failed (ie. timed out, etc), and a link to the log-viewer tied to
the point in time that the failure occurred.



Does anyone else have similar issues? I may be able to work on this a bit,
but I'm not sure how difficult it would be to implement this -- much of
this would have to be drawn from nimbus, which may or may not even have the
information.