You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (Created) (JIRA)" <ji...@apache.org> on 2011/11/02 03:12:32 UTC
[jira] [Created] (CASSANDRA-3440) local writes timing out cause
attempt to hint to self
local writes timing out cause attempt to hint to self
-----------------------------------------------------
Key: CASSANDRA-3440
URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.0.0
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Fix For: 1.0.2
As reported by Ramash Natarajan on the mailing list:
{noformat}
We have a 8 node cassandra cluster running 1.0.1. After running a load
test for a day we are seeing this exception in system.log file.
ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
AbstractCassandraDaemon.java (line 133) Fatal exception in thread
Thread[EXPIRING-MAP-TIMER-1,5,main]
java.lang.AssertionError: /10.19.102.12
at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3440) local writes timing out cause
attempt to hint to self
Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3440:
--------------------------------------
Attachment: 3440.txt
patch to (1) allow retrying a write-to-self that timed out and (2) improve defense against cascading failure when nodes are overwhelmed but not dead.
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
> Key: CASSANDRA-3440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 1.0.3
>
> Attachments: 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
> at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
> at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
> at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3440) local writes timing out cause
attempt to hint to self
Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-3440.
---------------------------------------
Resolution: Fixed
Updated to use this instead and committed:
{code}
. private static final Map<InetAddress, AtomicInteger> hintsInProgress = new MapMaker().concurrencyLevel(1).makeComputingMap(new Function<InetAddress, AtomicInteger>()
{
public AtomicInteger apply(InetAddress inetAddress)
{
return new AtomicInteger(0);
}
});
{code}
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
> Key: CASSANDRA-3440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Labels: hintedhandoff
> Fix For: 1.0.4
>
> Attachments: 3440-v2.txt, 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
> at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
> at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
> at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3440) local writes timing out cause
attempt to hint to self
Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156526#comment-13156526 ]
Jonathan Ellis commented on CASSANDRA-3440:
-------------------------------------------
The assertion can be triggered by a read-repair mutation timing out. Read-repair mutations (from RowRepairResolver.scheduleRepairs) are sent over MessagingService.
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
> Key: CASSANDRA-3440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Labels: hintedhandoff
> Fix For: 1.0.4
>
> Attachments: 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
> at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
> at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
> at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3440) local writes timing out cause
attempt to hint to self
Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146197#comment-13146197 ]
Sylvain Lebresne commented on CASSANDRA-3440:
---------------------------------------------
I'm obviously missing something but I don't find how that assertion could be triggered in the first place.
More precisely, I don't see that a node can hint itself, since a callback in MessagingService is only put through sendRR which isn't called for local writes (unless OPTIMIZE_LOCAL_REQUESTS is false, which it shouldn't).
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
> Key: CASSANDRA-3440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Labels: hintedhandoff
> Fix For: 1.0.3
>
> Attachments: 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
> at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
> at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
> at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3440) local writes timing out cause
attempt to hint to self
Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3440:
--------------------------------------
Attachment: 3440-v2.txt
v2 retains the assertion, and moves read repair updates back to the READ_REPAIR message type, where they won't be hinted. v2 retains the code to make HH more robust against causing coordinator OOMs.
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
> Key: CASSANDRA-3440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Labels: hintedhandoff
> Fix For: 1.0.4
>
> Attachments: 3440-v2.txt, 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
> at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
> at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
> at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3440) local writes timing out cause
attempt to hint to self
Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156610#comment-13156610 ]
Sylvain Lebresne commented on CASSANDRA-3440:
---------------------------------------------
Nit: it doesn't seem we use the row mutations saved in hintsInProgess, so maybe we could use a simple AtomicInteger, rather than a full concurrent map.
But otherwise patch looks good, +1.
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
> Key: CASSANDRA-3440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Labels: hintedhandoff
> Fix For: 1.0.4
>
> Attachments: 3440-v2.txt, 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
> at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
> at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
> at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
> at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira