You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (Created) (JIRA)" <ji...@apache.org> on 2011/11/02 03:12:32 UTC

[jira] [Created] (CASSANDRA-3440) local writes timing out cause attempt to hint to self

local writes timing out cause attempt to hint to self
-----------------------------------------------------

                 Key: CASSANDRA-3440
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.0
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis
             Fix For: 1.0.2


As reported by Ramash Natarajan on the mailing list:

{noformat}
We have a 8 node cassandra cluster running 1.0.1. After running a load
test for a day we are seeing this exception in system.log file.

ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
AbstractCassandraDaemon.java (line 133) Fatal exception in thread
Thread[EXPIRING-MAP-TIMER-1,5,main]
java.lang.AssertionError: /10.19.102.12
       at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
       at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
       at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
       at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
       at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
       at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
       at java.util.TimerThread.mainLoop(Timer.java:512)
       at java.util.TimerThread.run(Timer.java:462)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3440) local writes timing out cause attempt to hint to self

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3440:
--------------------------------------

    Attachment: 3440.txt

patch to (1) allow retrying a write-to-self that timed out and (2) improve defense against cascading failure when nodes are overwhelmed but not dead.
                
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0.3
>
>         Attachments: 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
>        at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
>        at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
>        at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
>        at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3440) local writes timing out cause attempt to hint to self

Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-3440.
---------------------------------------

    Resolution: Fixed

Updated to use this instead and committed:

{code}
.   private static final Map<InetAddress, AtomicInteger> hintsInProgress = new MapMaker().concurrencyLevel(1).makeComputingMap(new Function<InetAddress, AtomicInteger>()
    {
        public AtomicInteger apply(InetAddress inetAddress)
        {
            return new AtomicInteger(0);
        }
    });
{code}
                
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>              Labels: hintedhandoff
>             Fix For: 1.0.4
>
>         Attachments: 3440-v2.txt, 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
>        at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
>        at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
>        at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
>        at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3440) local writes timing out cause attempt to hint to self

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156526#comment-13156526 ] 

Jonathan Ellis commented on CASSANDRA-3440:
-------------------------------------------

The assertion can be triggered by a read-repair mutation timing out.  Read-repair mutations (from RowRepairResolver.scheduleRepairs) are sent over MessagingService.
                
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>              Labels: hintedhandoff
>             Fix For: 1.0.4
>
>         Attachments: 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
>        at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
>        at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
>        at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
>        at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3440) local writes timing out cause attempt to hint to self

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146197#comment-13146197 ] 

Sylvain Lebresne commented on CASSANDRA-3440:
---------------------------------------------

I'm obviously missing something but I don't find how that assertion could be triggered in the first place.
More precisely, I don't see that a node can hint itself, since a callback in MessagingService is only put through sendRR which isn't called for local writes (unless OPTIMIZE_LOCAL_REQUESTS is false, which it shouldn't).
                
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>              Labels: hintedhandoff
>             Fix For: 1.0.3
>
>         Attachments: 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
>        at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
>        at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
>        at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
>        at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3440) local writes timing out cause attempt to hint to self

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3440:
--------------------------------------

    Attachment: 3440-v2.txt

v2 retains the assertion, and moves read repair updates back to the READ_REPAIR message type, where they won't be hinted.  v2 retains the code to make HH more robust against causing coordinator OOMs.
                
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>              Labels: hintedhandoff
>             Fix For: 1.0.4
>
>         Attachments: 3440-v2.txt, 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
>        at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
>        at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
>        at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
>        at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3440) local writes timing out cause attempt to hint to self

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156610#comment-13156610 ] 

Sylvain Lebresne commented on CASSANDRA-3440:
---------------------------------------------

Nit: it doesn't seem we use the row mutations saved in hintsInProgess, so maybe we could use a simple AtomicInteger, rather than a full concurrent map.

But otherwise patch looks good, +1.
                
> local writes timing out cause attempt to hint to self
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3440
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>              Labels: hintedhandoff
>             Fix For: 1.0.4
>
>         Attachments: 3440-v2.txt, 3440.txt
>
>
> As reported by Ramash Natarajan on the mailing list:
> {noformat}
> We have a 8 node cassandra cluster running 1.0.1. After running a load
> test for a day we are seeing this exception in system.log file.
> ERROR [EXPIRING-MAP-TIMER-1] 2011-11-01 13:20:45,350
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[EXPIRING-MAP-TIMER-1,5,main]
> java.lang.AssertionError: /10.19.102.12
>        at org.apache.cassandra.service.StorageProxy.scheduleLocalHint(StorageProxy.java:339)
>        at org.apache.cassandra.net.MessagingService.scheduleMutationHint(MessagingService.java:201)
>        at org.apache.cassandra.net.MessagingService.access$500(MessagingService.java:64)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:175)
>        at org.apache.cassandra.net.MessagingService$2.apply(MessagingService.java:152)
>        at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:89)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira