You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Eric Evans (JIRA)" <ji...@apache.org> on 2012/06/28 00:32:44 UTC

[jira] [Commented] (CASSANDRA-4384) HintedHandoff can begin before SS knows the hostID

    [ https://issues.apache.org/jira/browse/CASSANDRA-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402628#comment-13402628 ] 

Eric Evans commented on CASSANDRA-4384:
---------------------------------------

This is a corner-case that only happens to a node that is restarted (and lost its hostId map as a result).  It is the result of a hint delivery being triggered (by the _reception_ of a gossip message) before the hostId could be processed from the message.

I see two ways to fix:

# Skip hint delivery when we don't (yet) have a hostId.
# Persist hostIds the way we do tokens.

(1) has the benefit of being a one-liner.  Presumably this code exists to schedule hint delivery for a remote host that was dead and is now alive.  Since in this case it is _us_ that was down, I don't think skipping would be Evil.

(2) adds complexity, but guards against any future cases of an {{IEndpointStateChangeSubscriber.onAlive()}} relying on {{TokenMetadata.isMember()}} before looking up a hostId.


                
> HintedHandoff can begin before SS knows the hostID
> --------------------------------------------------
>
>                 Key: CASSANDRA-4384
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4384
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.2
>
>
> Since HH fires from the FD, SS won't quite have the hostId yet:
> {noformat}
>  INFO 18:58:04,196 Started hinted handoff for host: null with IP: /10.179.65.102
>  INFO 18:58:04,197 Node /10.179.65.102 state jump to normal
> ERROR 18:58:04,197 Exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.NullPointerException
>         at org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:120)
>         at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpointInternal(HintedHandOffManager.java:304)
>         at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250)
>         at org.apache.cassandra.db.HintedHandOffManager.access$400(HintedHandOffManager.java:87)
>         at org.apache.cassandra.db.HintedHandOffManager$4.runMayThrow(HintedHandOffManager.java:433)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:26)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Simple solution seems to be getting the hostId from gossip instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira