You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "graham sanderson (JIRA)" <ji...@apache.org> on 2014/08/10 21:56:12 UTC
[jira] [Comment Edited] (CASSANDRA-7734) Schema pushes (seemingly) randomly not happening

    [ https://issues.apache.org/jira/browse/CASSANDRA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092169#comment-14092169 ] 

graham sanderson edited comment on CASSANDRA-7734 at 8/10/14 7:55 PM:
----------------------------------------------------------------------

Note this is not a problem _during_ the upgrade; it is a problem after the upgrade with all nodes successfully on 2.0.9

I'm a bit confused from a technical perspective, so would welcome any comments from others who have been near this code: [~iamaleksey], [~jbellis]

I'm not sure the lifecycle of IncomingTcpConnection... but there is code there (close method)

{code}
MessagingService.instance().resetVersion(from);
{code}

That unsets the (staticly scoped) version for an endpoint when closing... I would assume there could be overlapping connections for an endpoint, so this seems undesirable?

Also

{code}
MessagingService.instance().knowsVersion(endpoint) &&
MessagingService.instance().getRawVersion(endpoint) == MessagingService.current_version)
{code}

Since the endpoint->version mapping is static global and concurrent, we shouldn't be checking it twice

Also CASSANDRA-6700 changes

{code}
     public boolean knowsVersion(InetAddress endpoint)
     {
-        return versions.get(endpoint) != null;
+        return versions.containsKey(endpoint);
     }
{code}

However it is not clear that the map can ever contain a null value, and the getVersion() method still does the check the old way (versions.get(endpoint) != null)

In any case, I'm partly confused because I'm not quite sure how this endpoint version tracking is supposed to work, and the current state seems to have evolved as a result of lots of different issues (I don't think I've captured all of them here).


was (Author: graham sanderson):
Note this is not a problem _during_ the upgrade; it is a problem after the upgrade with all nodes successfully on 2.0.9

I'm a bit confused from a technical perspective, so would welcome any comments from others who have been near this code: [~iamaleksey], [~jbellis]

I'm not sure the lifecycle of IncomingTcpConnection... but there is code there (close method)

{code}
MessagingService.instance().resetVersion(from);
{code}

That unsets the (staticly scoped) version for an endpoint when closing... I would assume there could be overlapping connections for an endpoint, so this seems undesirable?

Also

{code}
MessagingService.instance().knowsVersion(endpoint) &&
MessagingService.instance().getRawVersion(endpoint) == MessagingService.current_version)
{code}

Since the endpoint->version mapping is static global and concurrent, we shouldn't be checking it twice

Also CASSANDRA-6700 changes

     public boolean knowsVersion(InetAddress endpoint)
     {
-        return versions.get(endpoint) != null;
+        return versions.containsKey(endpoint);
     }

However it is not clear that the map can ever contain a null value, and the getVersion() method still does the check the old way (versions.get(endpoint) != null)

In any case, I'm partly confused because I'm not quite sure how this endpoint version tracking is supposed to work, and the current state seems to have evolved as a result of lots of different issues (I don't think I've captured all of them here).

> Schema pushes (seemingly) randomly not happening
> ------------------------------------------------
>
>                 Key: CASSANDRA-7734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7734
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: graham sanderson
>
> We have been seeing problems since upgrade to 2.0.9 from 2.0.5.
> Basically after a while, new schema changes (we periodically add tables) start propagating very slowly to some nodes and fast to others. It looks from the logs and trace that in this case the "push" of the schema never happens (note a node has decided not to push to another node, it doesn't seem to start again) from the originating node to some of the other nodes. In this case though, we do see the other node end up pulling the schema some time later when it notices its schema is out of date.
> Here is code from 2.0.9 MigrationManager.announce
> {code}
>        for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
>         {
>             // only push schema to nodes with known and equal versions
>             if (!endpoint.equals(FBUtilities.getBroadcastAddress()) &&
>                     MessagingService.instance().knowsVersion(endpoint) &&
>                     MessagingService.instance().getRawVersion(endpoint) == MessagingService.current_version)
>                 pushSchemaMutation(endpoint, schema);
>         }
> {code}
> and from 2.0.5
> {code}
>         for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
>         {
>             if (endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 continue; // we've dealt with localhost already
>             // don't send schema to the nodes with the versions older than current major
>             if (MessagingService.instance().getVersion(endpoint) < MessagingService.current_version)
>                 continue;
>             pushSchemaMutation(endpoint, schema);
> 	}
> {code}
> the old getVersion() call would return MessagingService.current_version if the version was unknown, so the push would occur in this case. I don't have logging to prove this, but have strong suspicion that the version may end up null in some cases (which would have allowed schema propagation in 2.0.5, but not by somewhere after that and <= 2.0.9)



--
This message was sent by Atlassian JIRA
(v6.2#6252)