You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by John Song <js...@facebook.com> on 2009/10/09 00:11:24 UTC

Proposal to add per call tracing to thrift

Hi,

I am proposing an additional to the thrift implementation that enable per call tracing.

Objective:
Currently, a multi thread thrift service has no tracing capability built in.  A per call tracing works as following:
Assume that we have a thrift service MyService.  With a php client, it looks like

$client = new MyService(...);

// a normal call
$ret = $client->foo();

// enable tracing for the next call
Ttracer::triggerLevel(Ttracer::LEVEL_INFO);

// call again
$ret = $client->foo();

// get trace data
$data = Ttracer::getData();

// dump data
foreach ($data as $d) {
    echo $d[0] . “: “ . $d[1] . “\n”;
}
// where $d[0] is a timestamp of a float and precision in usc, and $d[1] is a string.

On the server side, in the service handler, we sprinkle the following calls (c++ server):

TTracer::trace(Ttracer::LEVEL_ERROR, “max iteration reached.  Bail out.”);

Why we need it:

 1.  For runtime debugging, profiling.
 2.  For monitoring.

Background:
While each service are free to add additional parameters to encapsulating the addtion of tracer flag, and tracer data, The proposed scheme will:

 1.  completely backward compatible.  Old server with new client, new server with old client are all interoperable.
 2.  No need to modify existing thrift definition, or change new definition to accommodate tracing.
 3.  only change at the protocol level.
 4.  Almost no overhead.

The proposal takes advantages of the following facts:

 1.  Thrift doesn’t defined how to encapsulate a function call, specifically, how function parameters and function results are encapsulated.  Thrift implementation uses struct for the function argument and function result, i.e. All function arguments are grouped together in generated structs: $(SERVICE)_$(FUNCNAME)_args, $(SERVICE)_$(FUNCNAME)_pargs for function arguments, and $(SERVICE)_$(FUNCNAME)_result for function results.
 2.  a thrift generated struct’s read function skip any field who’s field id it doesn’t understand so a field injection is safe with services built with older thrift generator.
 3.  Each service request is handled by one thread.

The proposal calls for the creation of a Ttracer class that use Thread Local Storage (TLS) to store tracer flag and tracer data.  Tracer flag is injected into a function argument if tracing is requested on the client side.  On the server side, that trace flag is extracted in struct reader by noticing a well known field id, 0x1001.  Server tracing is enabled.  Trace calls are executed and data collected.  When writing function result back to client, if tracing is enabled, and there is trace data, then trace data is injected into function result struct using another well known field id: 0x1002.

Proposal:

 1.  reserve field id larger than 0x1000 for internal use.  Field id is of type i16.  It is reasonable to assume that there will be no reasonable function that has more than 4096 parameters.
 2.  reserve 0x1001 field id for setting trace level, which is of type i32.
 3.  reserve 0x1002 field id for trace data, which is of type, in c++ parlance, vector<pair<double, string> >.
 4.  in struct reader, add a case statement such that when 0x1001 is encountered extract trace level on the server side and set TLS storage trace level.
 5.  in struct reader, add a case statement such that when 0x1002 is encountered extract trace data on the client side and store it in a TLS variable.
 6.  in struct writer, if the strut is a function result then injecting trace data into the function result with field id 0x1002
 7.  in struct writer, if the struct is a function argument then inject trace level into the function parameter with field id 0x1001.

Overhead:
The overhead of this change should be minimum to none when tracing is not enabled.  The injection and  extraction only happens when enabled.  Testing for whether or not tracing is enabled is an int var comparison.

I have completed coding on cpp parser, generator, client, server implementation, php generator and client (there is no php server) and am pending reviews.

Please let me know your thoughts.

Thanks,

-john song

Re: Proposal to add per call tracing to thrift

Posted by Todd Lipcon <to...@cloudera.com>.
Hey John,

This sounds like it could be a useful extension. Could you open a JIRA?
Proposals for new features usually work best in JIRA format so the
discussion ends up next to the resulting patches, reviews, etc.

-Todd

On Thu, Oct 8, 2009 at 3:11 PM, John Song <js...@facebook.com> wrote:

> Hi,
>
> I am proposing an additional to the thrift implementation that enable per
> call tracing.
>
> Objective:
> Currently, a multi thread thrift service has no tracing capability built
> in.  A per call tracing works as following:
> Assume that we have a thrift service MyService.  With a php client, it
> looks like
>
> $client = new MyService(...);
>
> // a normal call
> $ret = $client->foo();
>
> // enable tracing for the next call
> Ttracer::triggerLevel(Ttracer::LEVEL_INFO);
>
> // call again
> $ret = $client->foo();
>
> // get trace data
> $data = Ttracer::getData();
>
> // dump data
> foreach ($data as $d) {
>    echo $d[0] . “: “ . $d[1] . “\n”;
> }
> // where $d[0] is a timestamp of a float and precision in usc, and $d[1] is
> a string.
>
> On the server side, in the service handler, we sprinkle the following calls
> (c++ server):
>
> TTracer::trace(Ttracer::LEVEL_ERROR, “max iteration reached.  Bail out.”);
>
> Why we need it:
>
>  1.  For runtime debugging, profiling.
>  2.  For monitoring.
>
> Background:
> While each service are free to add additional parameters to encapsulating
> the addtion of tracer flag, and tracer data, The proposed scheme will:
>
>  1.  completely backward compatible.  Old server with new client, new
> server with old client are all interoperable.
>  2.  No need to modify existing thrift definition, or change new definition
> to accommodate tracing.
>  3.  only change at the protocol level.
>  4.  Almost no overhead.
>
> The proposal takes advantages of the following facts:
>
>  1.  Thrift doesn’t defined how to encapsulate a function call,
> specifically, how function parameters and function results are encapsulated.
>  Thrift implementation uses struct for the function argument and function
> result, i.e. All function arguments are grouped together in generated
> structs: $(SERVICE)_$(FUNCNAME)_args, $(SERVICE)_$(FUNCNAME)_pargs for
> function arguments, and $(SERVICE)_$(FUNCNAME)_result for function results.
>  2.  a thrift generated struct’s read function skip any field who’s field
> id it doesn’t understand so a field injection is safe with services built
> with older thrift generator.
>  3.  Each service request is handled by one thread.
>
> The proposal calls for the creation of a Ttracer class that use Thread
> Local Storage (TLS) to store tracer flag and tracer data.  Tracer flag is
> injected into a function argument if tracing is requested on the client
> side.  On the server side, that trace flag is extracted in struct reader by
> noticing a well known field id, 0x1001.  Server tracing is enabled.  Trace
> calls are executed and data collected.  When writing function result back to
> client, if tracing is enabled, and there is trace data, then trace data is
> injected into function result struct using another well known field id:
> 0x1002.
>
> Proposal:
>
>  1.  reserve field id larger than 0x1000 for internal use.  Field id is of
> type i16.  It is reasonable to assume that there will be no reasonable
> function that has more than 4096 parameters.
>  2.  reserve 0x1001 field id for setting trace level, which is of type i32.
>  3.  reserve 0x1002 field id for trace data, which is of type, in c++
> parlance, vector<pair<double, string> >.
>  4.  in struct reader, add a case statement such that when 0x1001 is
> encountered extract trace level on the server side and set TLS storage trace
> level.
>  5.  in struct reader, add a case statement such that when 0x1002 is
> encountered extract trace data on the client side and store it in a TLS
> variable.
>  6.  in struct writer, if the strut is a function result then injecting
> trace data into the function result with field id 0x1002
>  7.  in struct writer, if the struct is a function argument then inject
> trace level into the function parameter with field id 0x1001.
>
> Overhead:
> The overhead of this change should be minimum to none when tracing is not
> enabled.  The injection and  extraction only happens when enabled.  Testing
> for whether or not tracing is enabled is an int var comparison.
>
> I have completed coding on cpp parser, generator, client, server
> implementation, php generator and client (there is no php server) and am
> pending reviews.
>
> Please let me know your thoughts.
>
> Thanks,
>
> -john song
>