You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2015/12/04 11:53:11 UTC
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

    [ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041410#comment-15041410 ] 

Sylvain Lebresne commented on CASSANDRA-10477:
----------------------------------------------

* The failure detector will never return false for the local host, so the changes in the 2nd branch of commitPaxos are unnecessary.
* We're kind of dodging the hint "overload" protection on the paxos path as we don't use {{sendToHintedEndpoints}} (which in particular makes the comment on {{commitPaxosLocal}} misleading as it suggests otherwise). I think the simplest solution is to move the overload test from {{sendToHintedEndpoints}} to some {{checkOverloaded()}} method and call that in {{commitPaxos}} too.
* Instead of adding the {{droppable()}} method to {{LocalMutationRunnable}}, we should probably use {{MessagingService.DROPPABLE_VERBS.contains(verb)}}.
* In theory, we could still run into the problem of that ticket if {{OPTIMIZE_LOCAL_REQUESTS}} is {{false}}. And in fact, I believe this option is unsafe since at least CASSANDRA-4753 as we somewhat strongly assume writes to the localhost do *not* go through {{MessagingService}}. So I would suggest ditching that option. Not only is it unsafe, but it's not used anywhere by the code and it's hardcoded so you have to change the code and recompile to even use it (which means I doubt anyone has even tried it in a long long time). And if we end up needing it in the future, we'll have to figure out how to make it safe.
* Why isn't the added assertion in {{WriteCallbackInfo}} on 3.0 not using {{!shouldHint}} lie in the 2.1 patch?


> java.lang.AssertionError in StorageProxy.submitHint
> ---------------------------------------------------
>
>                 Key: CASSANDRA-10477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: CentOS 6, Oracle JVM 1.8.45
>            Reporter: Severin Leonhardt
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 CassandraDaemon.java:223 - Exception in thread Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
>         at org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_45]
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_45]
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_45]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients get a lot of timeouts. We need to restart the affected Cassandra node to get back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the assert from being logged. At some point the read latency becomes bad again. Restarting the node where hinted handoff was disabled results in the read latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)