You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "David Alves (JIRA)" <ji...@apache.org> on 2012/08/20 14:28:39 UTC
[jira] [Comment Edited] (CASSANDRA-1123) Allow tracing query details

    [ https://issues.apache.org/jira/browse/CASSANDRA-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437816#comment-13437816 ] 

David Alves edited comment on CASSANDRA-1123 at 8/20/12 11:28 PM:
------------------------------------------------------------------

Attached is a patch that implements almost everything mentioned:

- Trace is implemented asynchronously with a new Stage that has a threadpool with a single thread that refuses to execute when the queue if full ( as suggested a warn is logged, experiments say that under huge load 0.001 traceProbability still works without rejecting events). (also this threadpool is the only one that does not propagate trace events).

- Allows to enable tracing/disable tracing from cli. Also enable tracing has two parameters traceProbability (the prop that any single request from that client gets traced) and maxTraceNumber to allow to set a maximum number of traces to do (-1 set this to Integer.MAX_INT which is also the default)

- Adds the possibility to enable tracing in stress (using -tr probability [optionally maxNumTraces]

- TraceEvents can be build using a fluent builder (TraceEventBuilder) that is also able to deserialize events (both from thrift and from IColumns).

- All requests in CassandraServer start a tracing session when tracing is enabled and all parameters are stored along with the request details. This is done using a ThriftType that is able to serialize and deserialize thrift objects into Cassandra.

- User Request/Reply, Stage Start/Finish and Message Request/Reply are traced along with specific custom requests (such as apply_mutation, and get_column_family)

- Cli contains two other commands to explore traces:

show tracing summary [request_name] - this displays a summary for a request type:

{code}
Summary for sessions of request: batch_mutate
Total Sessions: 500
Total Events: 3190
                       ==============================================================================
                       |    Avg.    |   StdDev.  |   Max.     |    Min.    |     99%   |     Unit   |
=====================================================================================================
| Latency              |      22.48 |      28.68 |     293.90 |       0.56 |     150.38|       msec |
|---------------------------------------------------------------------------------------------------|
| Batch Rows           |       1.00 |       0.00 |       1.00 |       1.00 |       1.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Mutations            |       5.00 |       0.00 |       5.00 |       5.00 |       5.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Deletions            |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Columns              |       5.00 |       0.00 |       5.00 |       5.00 |       5.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Counters             |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Super Col.           |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Written Bytes        |     170.00 |       0.00 |     170.00 |     170.00 |     170.00| amount/req |
|---------------------------------------------------------------------------------------------------|

Quickest Request sessionId: 135b0c60-eac0-11e1-0000-fe8ebeead9ff
Slowest  Request sessionId: f09c3e10-eabf-11e1-0000-fe8ebeead9ff
{code}

explain trace session [sessionId] - displays the complete set of events of a tracing session along with a to-scale timeline for its execution.

{code}
Session Summary: f09c3e10-eabf-11e1-0000-fe8ebeead9ff
Request: batch_mutate
Coordinator: /127.0.0.1
Interacting nodes: 2 {[/127.0.0.1, /127.0.0.2]}
Duration: 293901000 nano seconds (293.901 msecs)
Consistency Level: ONE
Request Timeline:

||--------------------------------------------------------------------------------------------------|
Se;/Msg                                                                                             /Se

                                                                                              |-||   
                                                                                              Msg,apply_mutation;St:[Mutation];/St:[Mutation]   


Caption: 
Se              - Session start (start of request: batch_mutate)
/Se             - Session end (end of request: batch_mutate)
St[StageName]   - Begin of Stage with StageName (e.g. Mutation)
/St[StageName]  - End of Stage with StageName
Msg             - Arrival of message from coordinator
/Msg            - Reply to message from coordinator
???             - Custom traces 

NOTE: timelines are drawn to scale but precision is limited to 100 positions.
 When multiple events occur in the same position (due to lack of precision) they are separated by ','
 When two positions are too close together trace names are separated by ';'

Full event list (in order of occurrence)

Id: f09c3e11-eabf-11e1-0000-fe8ebeead9ff Name: batch_mutate           Type: SESSION_START          Source: /127.0.0.1             Timestamp (millis): 1345464589425          Duration (nanos): 73000                 
Id: f09c8c30-eabf-11e1-0000-fe8ebeead9ff Name: MessageDeparture[47681] Type: MESSAGE_DEPARTURE      Source: /127.0.0.1             Timestamp (millis): 1345464589427          Duration (nanos): 2003000               
Id: f0c63440-eabf-11e1-0000-7fd66bb03abf Name: MessageArrival[47681]  Type: MESSAGE_ARRIVAL        Source: /127.0.0.2             Timestamp (millis): 1345464589700          Duration (nanos): 47000                 
Id: f0c745b0-eabf-11e1-0000-7fd66bb03abf Name: MutationStage          Type: STAGE_START            Source: /127.0.0.2             Timestamp (millis): 1345464589707          Duration (nanos): 7045000               
Id: f0c745b1-eabf-11e1-0000-7fd66bb03abf Name: apply_mutation         Type: MISC                   Source: /127.0.0.2             Timestamp (millis): 1345464589707          Duration (nanos): 25000                 
Id: f0c793d0-eabf-11e1-0000-7fd66bb03abf Name: MutationStage          Type: STAGE_FINISH           Source: /127.0.0.2             Timestamp (millis): 1345464589709          Duration (nanos): 9617000               
Id: f0c91a70-eabf-11e1-0000-fe8ebeead9ff Name: SESSION_END            Type: SESSION_END            Source: /127.0.0.1             Timestamp (millis): 1345464589719          Duration (nanos): 293901000          
{code}

I timeline from a fully distributed deployment (this was from ccm) would be much more balanced (the lower line would be bigger)

NOTES:

- Some parameters are hard-coded (such as the tracing queue, the ttl for trace events and the number of trace events to fetch for the summaries). It might be better to configure these somewhere but I've refrained from adding more config variables without consensus.

- There should be more traces.

- I implemented a distributed system keyspace to be used elsewhere but in the end reverted to using the trace keyspace, reasoning was that a distributed system table might want different things than a tracing system (replication factor for instance). Since tracing has its own keyspace also kept CFMetadata there (Im glad to move it if it makes sense even without the dist systable or if the consensus is that trace should be there)

- In order to display the summarizing cli command the commons math jar is required (for statistics)

- I ended up changing the schema to:

{code}
CREATE KEYSPACE trace WITH strategy_class = 'SimpleStrategy'
  AND strategy_options:replication_factor = '1';

CREATE TABLE trace.trace_events(sessionId  timeuuid,
  coordinator       inet,
  eventId          timeuuid,
  description       text,
  duration          bigint,
  happened_at          timestamp,
  name              text,
  payload_types     map<text, text>,
  payload           map<text, blob>,
  source            inet,
  type              text,
  PRIMARY KEY (sessionId, coordinator, eventId));

CREATE INDEX idx_name ON trace.trace_events (name);
{code}

Note the missing Sessions table (all the info there was redundant) and the added fields for payload.

Finally I'm having issues using the index from thrift. Is there an incompatibility (i.e., using an IndexExpression with a CQL3 table?).

- Tests pass
                
      was (Author: dr-alves):
    Attached is a patch that implements almost everything mentioned:

- Trace is implemented asynchronously with a new Stage that has a threadpool with a single thread that refuses to execute when the queue if full ( as suggested a warn is logged, experiments say that under huge load 0.001 traceProbability still works without rejecting events). (also this threadpool is the only one that does not propagate trace events).

- Allows to enable tracing/disable tracing from cli. Also enable tracing has two parameters traceProbability (the prop that any single request from that client gets traced) and maxTraceNumber to allow to set a maximum number of traces to do (-1 set this to Integer.MAX_INT which is also the default)

- Adds the possibility to enable tracing in stress (using -tr probability [optionally maxNumTraces]

- TraceEvents can be build using a fluent builder (TraceEventBuilder) that is also able to deserialize events (both from thrift and from IColumns).

- All requests in CassandraServer start a tracing session when tracing is enabled and all parameters are stored along with the request details. This is done using a ThriftType that is able to serialize and deserialize thrift objects into Cassandra.

- User Request/Reply, Stage Start/Finish and Message Request/Reply are traced along with specific custom requests (such as apply_mutation, and get_column_family)

- Cli contains two other commands to explore traces:

show tracing summary [request_name] - this displays a summary for a request type:

{quote}
Summary for sessions of request: batch_mutate
Total Sessions: 500
Total Events: 3190
                       ==============================================================================
                       |    Avg.    |   StdDev.  |   Max.     |    Min.    |     99%   |     Unit   |
=====================================================================================================
| Latency              |      22.48 |      28.68 |     293.90 |       0.56 |     150.38|       msec |
|---------------------------------------------------------------------------------------------------|
| Batch Rows           |       1.00 |       0.00 |       1.00 |       1.00 |       1.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Mutations            |       5.00 |       0.00 |       5.00 |       5.00 |       5.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Deletions            |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Columns              |       5.00 |       0.00 |       5.00 |       5.00 |       5.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Counters             |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Super Col.           |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req |
|---------------------------------------------------------------------------------------------------|
| Written Bytes        |     170.00 |       0.00 |     170.00 |     170.00 |     170.00| amount/req |
|---------------------------------------------------------------------------------------------------|

Quickest Request sessionId: 135b0c60-eac0-11e1-0000-fe8ebeead9ff
Slowest  Request sessionId: f09c3e10-eabf-11e1-0000-fe8ebeead9ff
{quote}

explain trace session [sessionId] - displays the complete set of events of a tracing session along with a to-scale timeline for its execution.

{quote}
Session Summary: f09c3e10-eabf-11e1-0000-fe8ebeead9ff
Request: batch_mutate
Coordinator: /127.0.0.1
Interacting nodes: 2 {[/127.0.0.1, /127.0.0.2]}
Duration: 293901000 nano seconds (293.901 msecs)
Consistency Level: ONE
Request Timeline:

||--------------------------------------------------------------------------------------------------|
Se;/Msg                                                                                             /Se

                                                                                              |-||   
                                                                                              Msg,apply_mutation;St:[Mutation];/St:[Mutation]   


Caption: 
Se              - Session start (start of request: batch_mutate)
/Se             - Session end (end of request: batch_mutate)
St[StageName]   - Begin of Stage with StageName (e.g. Mutation)
/St[StageName]  - End of Stage with StageName
Msg             - Arrival of message from coordinator
/Msg            - Reply to message from coordinator
???             - Custom traces 

NOTE: timelines are drawn to scale but precision is limited to 100 positions.
 When multiple events occur in the same position (due to lack of precision) they are separated by ','
 When two positions are too close together trace names are separated by ';'

Full event list (in order of occurrence)

Id: f09c3e11-eabf-11e1-0000-fe8ebeead9ff Name: batch_mutate           Type: SESSION_START          Source: /127.0.0.1             Timestamp (millis): 1345464589425          Duration (nanos): 73000                 
Id: f09c8c30-eabf-11e1-0000-fe8ebeead9ff Name: MessageDeparture[47681] Type: MESSAGE_DEPARTURE      Source: /127.0.0.1             Timestamp (millis): 1345464589427          Duration (nanos): 2003000               
Id: f0c63440-eabf-11e1-0000-7fd66bb03abf Name: MessageArrival[47681]  Type: MESSAGE_ARRIVAL        Source: /127.0.0.2             Timestamp (millis): 1345464589700          Duration (nanos): 47000                 
Id: f0c745b0-eabf-11e1-0000-7fd66bb03abf Name: MutationStage          Type: STAGE_START            Source: /127.0.0.2             Timestamp (millis): 1345464589707          Duration (nanos): 7045000               
Id: f0c745b1-eabf-11e1-0000-7fd66bb03abf Name: apply_mutation         Type: MISC                   Source: /127.0.0.2             Timestamp (millis): 1345464589707          Duration (nanos): 25000                 
Id: f0c793d0-eabf-11e1-0000-7fd66bb03abf Name: MutationStage          Type: STAGE_FINISH           Source: /127.0.0.2             Timestamp (millis): 1345464589709          Duration (nanos): 9617000               
Id: f0c91a70-eabf-11e1-0000-fe8ebeead9ff Name: SESSION_END            Type: SESSION_END            Source: /127.0.0.1             Timestamp (millis): 1345464589719          Duration (nanos): 293901000          
{quote}

I timeline from a fully distributed deployment (this was from ccm) would be much more balanced (the lower line would be bigger)

NOTES:

- Some parameters are hard-coded (such as the tracing queue, the ttl for trace events and the number of trace events to fetch for the summaries). It might be better to configure these somewhere but I've refrained from adding more config variables without consensus.

- There should be more traces.

- I implemented a distributed system keyspace to be used elsewhere but in the end reverted to using the trace keyspace, reasoning was that a distributed system table might want different things than a tracing system (replication factor for instance). Since tracing has its own keyspace also kept CFMetadata there (Im glad to move it if it makes sense even without the dist systable or if the consensus is that trace should be there)

- In order to display the summarizing cli command the commons math jar is required (for statistics)

- I ended up changing the schema to:

{quote}
CREATE KEYSPACE trace WITH strategy_class = 'SimpleStrategy'
  AND strategy_options:replication_factor = '1';

CREATE TABLE trace.trace_events(sessionId  timeuuid,
  coordinator       inet,
  eventId          timeuuid,
  description       text,
  duration          bigint,
  happened_at          timestamp,
  name              text,
  payload_types     map<text, text>,
  payload           map<text, blob>,
  source            inet,
  type              text,
  PRIMARY KEY (sessionId, coordinator, eventId));

CREATE INDEX idx_name ON trace.trace_events (name);
{quote}

Note the missing Sessions table (all the info there was redundant) and the added fields for payload.

Finally I'm having issues using the index from thrift. Is there an incompatibility (i.e., using an IndexExpression with a CQL3 table?).

- Tests pass
                  
> Allow tracing query details
> ---------------------------
>
>                 Key: CASSANDRA-1123
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1123
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2.0
>
>         Attachments: 1123-3.patch.gz, 1123.patch, 1123.patch
>
>
> In the spirit of CASSANDRA-511, it would be useful to tracing on queries to see where latency is coming from: how long did row cache lookup take?  key search in the index?  merging the data from the sstables?  etc.
> The main difference vs setting debug logging is that debug logging is too big of a hammer; by turning on the flood of logging for everyone, you actually distort the information you're looking for.  This would be something you could set per-query (or more likely per connection).
> We don't need to be as sophisticated as the techniques discussed in the following papers but they are interesting reading:
> http://research.google.com/pubs/pub36356.html
> http://www.usenix.org/events/osdi04/tech/full_papers/barham/barham_html/
> http://www.usenix.org/event/nsdi07/tech/fonseca.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira