You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Josh McKenzie (Jira)" <ji...@apache.org> on 2021/08/27 16:00:00 UTC

[jira] [Comment Edited] (CASSANDRA-16880) Catch read repair timeouts and add metrics to indicate they occurred

    [ https://issues.apache.org/jira/browse/CASSANDRA-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405389#comment-17405389 ] 

Josh McKenzie edited comment on CASSANDRA-16880 at 8/27/21, 3:59 PM:
---------------------------------------------------------------------

2 test failures:
 # JDK11_unit: testGetPositionsKeyCacheStats, which passes locally for me in IDE + cmd line and appears unrelated to this diff
 # JDK11 dtest no vnode, with a teardown error on test_complementary_deletion_with_limit_on_static_column_with_empty_partitions which I also cannot reproduce locally and appears unrelated to this diff.

Pending discussion on the ML about trivial improvements we'll decide where to merge this (4.0.x vs. 4.x); should be clean diff to either.
||Item|Link|
|JDK8 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/67/workflows/816fdc30-7f88-4b50-86c1-5c62e18f6db5]|
|JDK11 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/67/workflows/c7f102ca-97d2-4d88-ba33-69699b4328e0]|
|Branch|[Link|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:CASSANDRA-16880?expand=1]|

edit: re-based on trunk instead of the .0.x line. Re-running core tests there for good measure.


was (Author: jmckenzie):
2 test failures:
 # JDK11_unit: testGetPositionsKeyCacheStats, which passes locally for me in IDE + cmd line and appears unrelated to this diff
 # JDK11 dtest no vnode, with a teardown error on test_complementary_deletion_with_limit_on_static_column_with_empty_partitions which I also cannot reproduce locally and appears unrelated to this diff.

Pending discussion on the ML about trivial improvements we'll decide where to merge this (4.0.x vs. 4.x); should be clean diff to either.
||Item|Link|
|JDK8 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/60/workflows/67473fbc-88f7-44d3-a409-b616e2cadbb4]|
|JDK11 tests|[Link|https://app.circleci.com/pipelines/github/josh-mckenzie/cassandra/60/workflows/15e09ea4-4d35-4035-9afc-ff0d1089041e]|
|Branch|[Link|https://github.com/apache/cassandra/compare/cassandra-4.0...josh-mckenzie:CASSANDRA-16880?expand=1]|

> Catch read repair timeouts and add metrics to indicate they occurred
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-16880
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16880
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Observability/Metrics
>            Reporter: Josh McKenzie
>            Assignee: Josh McKenzie
>            Priority: Normal
>             Fix For: 4.1
>
>
> When we fire off async read repairs onto their own executor they may time out and in doing so, we don't have anything that stops them from propagating that timeout exception the way up to CassandraDaemon's uncaught exception handler. When this happens we logs at ERROR.
> Obviously a timeout isn't great, but it's not an ERROR, so we should trap them instead and add some metrics around this occurrance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org