You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ben Chan (JIRA)" <ji...@apache.org> on 2014/11/10 23:42:36 UTC
[jira] [Commented] (CASSANDRA-5483) Repair tracing

    [ https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205451#comment-14205451 ] 

Ben Chan commented on CASSANDRA-5483:
-------------------------------------

Sorry; SMART and reallocated sectors. Which only really accounts for about 2-3 days of the delay. But just to say that it wasn't merely a case of hangover from too much Halloween candy.

----

Updated https://github.com/usrbincc/cassandra/tree/5483-review (currently at commit 202a2e2e5e602).

Merges cleanly with trunk (at commit d286ac7d072fe), building and testing cleanly.

I decided to be a little opinionated and did some refactoring along the lines of my Oct 23 message.
- Used a TraceState#waitActivity function instead of TraceState#isDone (waitActivity gets closer to doing only what it says on the tin; makes it less hairy to comment).
- Moved all exponential backoff timeout code to StorageService#createQueryThread

In addition, I renamed TraceState#enableNotifications to TraceState#enableActivityNotification to attempt (naming is hard) to avoid confusion with TraceState#setNotificationHandle, which is entirely unrelated.

Note: beyond having made this opinionated edit, I'm not planning to be particularly opinionated about advocating for it. All of that code should eventually go away once there is some way to get notified about table updates instead of having to do all that messy polling.

Extra note: Cassandra triggers seem to be very close to what is needed, if only they could be specified to run on a given node (i.e. the node that is being repaired). The last time I checked on this, this wasn't possible.

----

Unfiltered traces:

- The extra traces are generic message send-receive traces that existed prior to this patch. They were originally there for query tracing, which benefits from more detailed tracing.
- These extra traces were filtered out for repair up until v16 of this patch. This means that any discussions of trace messages prior to that point are referring to the filtered traces.

But I can't say that they're doing any real harm. I mean, it's only 3x the traces (estimated), and not an order of magnitude or more.

It's probably fine as it is. I certainly can't unequivocally state that there's no use for those extra traces. Besides, extra information can always be filtered out at a higher level (assuming it's tagged appropriately).

> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>             Fix For: 3.0
>
>         Attachments: 5483-full-trunk.txt, 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch, 5483-v07-08-Fix-brace-style.patch, 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch, 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 5483-v08-14-Poll-system_traces.events.patch, 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 5483-v10-17-minor-bugfixes-and-changes.patch, 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 5483-v14-01-squashed.patch, 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch, trunk@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, trunk@8ebeee1-5483-v01-002-simple-repair-tracing.txt, v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch, v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing stores traces to system keyspace. With it, you don't have to lookup each log file to see what was the status and how it performed the repair you invoked. Instead, you can query the repair log with session ID to see the state and stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)