You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2018/03/20 17:37:00 UTC

[jira] [Commented] (CASSANDRA-14155) [TRUNK] Gossiper somewhat frequently hitting an NPE on node startup with dtests at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)

    [ https://issues.apache.org/jira/browse/CASSANDRA-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406751#comment-16406751 ] 

Sam Tunnicliffe commented on CASSANDRA-14155:
---------------------------------------------

I'm not sure that the scenario above can happen quite as described. When \{{loadRingState}} adds the endpoints to \{{endpointStateMap}} they're created with a brand new \{{HeartBeatState}}, one with \{{(generation, version) == (0, 0)}}. In \{{Gossiper::examineGossiper}}, the empty digest list in a shadow SYN is replaced with a list containing one digest for every known endpoint and these are also initialized with {{(0,0)}}. So if a node were to finish its shadow round, load ring state, start gossip and immediately receive a shadow round SYN from a peer, it would not include any state for that peer as the generation/version in the digest would match the one in the local epState. 

Of course though, the stacktrace in the description certainly indicates that the epStates map obtained from the shadow round did contain a state for the node in question and that its {{HOST_ID}} appState is missing. So I'm all for adding the check & assertion error in {{isSafeForStartup}}, although I think we ought to log more detail here, probably the epStates map in its entireity. I'm less comfortable with changing the behaviour of the shadow round if we're not really clear on what's causing it. As we've only seen this sporadically in tests, how do you feel about adding the assertion (& any other error logging that may be useful) and seeing if that helps us track down the cause if/when we see the error in future test runs? My fear is that this is a symptom of a more pernicious race like the ones in CASSANDRA-13700 & CASSANDRA-11825.

> [TRUNK] Gossiper somewhat frequently hitting an NPE on node startup with dtests at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14155
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14155
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Kjellman
>            Assignee: Jason Brown
>            Priority: Major
>
> Gossiper is somewhat frequently hitting an NPE on node startup with dtests at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: [ERROR [main] 2018-01-08 21:41:01,832 CassandraDaemon.java:675 - Exception encountered during startup
> java.lang.NullPointerException: null
>         at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:511) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:761) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:621) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:568) ~[main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:360) [main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) [main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:658) [main/:na], ERROR [main] 2018-01-08 21:41:01,832 CassandraDaemon.java:675 - Exception encountered during startup
> java.lang.NullPointerException: null
>         at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:511) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:761) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:621) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:568) ~[main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:360) [main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) [main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:658) [main/:na]]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org