You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "James Xu (JIRA)" <ji...@apache.org> on 2013/12/15 08:03:06 UTC

[jira] [Created] (STORM-158) Live tracing of tuples through the flow

James Xu created STORM-158:
------------------------------

             Summary: Live tracing of tuples through the flow
                 Key: STORM-158
                 URL: https://issues.apache.org/jira/browse/STORM-158
             Project: Apache Storm (Incubating)
          Issue Type: New Feature
            Reporter: James Xu
            Priority: Minor


https://github.com/nathanmarz/storm/issues/531

Storm should let you bless a record as a "tracer bullet", to be specially reported on as it progresses through the flow. It's important that this be completely transparent -- that is, I can unintrusively switch tracing on in the flow, and that tracer bullets are real, live records (not specially-crafted packets). The intent is that a small fraction of records be tracer bullets. If you harm performance by passing too many through, that's your fault for passing too many through.

@maphysics is working on code to implement this --- we're finding it very useful for debugging flows -- and so we'd like to see if this is functionality you'd pull into the mainline or storm-contrib. (If this already exists, please advise.)

Pragmatically, what we're working on is a TracerHook that...

* In 'prepare', captures some helpful information about the flow.
* The hook points (emit, boltExecute, both Ack/Fails) look for a field called '_trace'; if absent or null, they return immediately. Otherwise, the field is a HashMap indicating that the tuple should be traced. (We're using a HashMap to allow whoever designated the tuple for tracing to inject extra metadata for the trace report)
* If it's a tracer bullet each hook point simply writes a verbose, helpfully-formatted biography of the record to the log (the execute hook is more verbose than the others).
the component_id, sources & targets, etc of the bolt/spout
the hook point it hit ('emit', 'ack', etc)
list of output tuple's values in regular order
* ...and records minimal provenance:
The execute hook saves off (into a private variable on the hook) the _trace field of the input tuple (a question about this is below)
calls to the emit hook take the saved _trace info and duplicate it into the output tuple

Some Questions

Should these be metrics or hooks? Right now, we're using the hook functionality, not the metrics, because...

* It wasn't clear how to inspect the tuple from the metric
* The lifecycle of the metrics matches the bolt's, not the record's -- we'd prefer as-rapid-as-reasonable reporting, tied to the tuple's progress
* Along with, basically, using metrics would require more spelunking to figure out... We'll follow up to the mailing list with questions on this. So it seems like a hook is the right thing, although cycling a trace trail back to nimbus has a lot of appeal.

Are the hook points dependably executed in-order? That is, if the execute hook point for bolt A on tuple Q is invoked, can we depend that (until execute is called again), the calls to emit and then to ack/fail are direct consequence of processing tuple Q? (The code seems to say yes, but can we treat that as part of the contract?)

How do we transparently carry the traceinfo all the way through the topology_ -- yet not annotate every single bolt/spout with trace_info as a field? @maphysics is following up with a separate issue on this. We need to decorate each tuple generated from processing a tracer bullet with a _trace field of its own -- but without modifying the topology or its bolts.

/cc @maphysics @kornypoet

As mentioned, we'd eventually like to dispatch tracings to nimbus (or somewhere central). Instead of metrics, another approach would send them to an implicit 'tracings' stream, similar to the 'failure stream' mentioned in #13. Has there been any progress on implicit failure streams?

see also #146



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)