You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sasha Dolgy (JIRA)" <ji...@apache.org> on 2011/06/14 11:12:53 UTC

[jira] [Created] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

AntiEntropyService excluding nodes that are on version 0.7 or sooner
--------------------------------------------------------------------

                 Key: CASSANDRA-2768
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.8.0
         Environment: 4 node environment -- 

Originally 0.7.6-2 with a Keyspace defined with RF=3

Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
            Reporter: Sasha Dolgy


When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:

INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
java.util.ConcurrentModificationException
      at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
      at java.util.HashMap$KeyIterator.next(HashMap.java:828)
      at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
      at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)

The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  

nodetool ring shows that all nodes are up.  

Client connections (read / write) are not having issues..  

nodetool version on all nodes shows that each node is 0.8.0

At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049421#comment-13049421 ] 

Sasha Dolgy commented on CASSANDRA-2768:
----------------------------------------

After dropping the old keyspace, and creating the new keyspace .. I didn't cycle through and stop / start each node.  Have just done that now and all appears to be fine.  No more errors in the logs.  This problem was 100% there, on version 0.8.0 with the old keyspace.  

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Héctor Izquierdo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058061#comment-13058061 ] 

Héctor Izquierdo commented on CASSANDRA-2768:
---------------------------------------------

I'm on 0.8.1 updating from 0.7.6-2 and I have stumbled upong this bug. I can't run repair on a node whose disk broke

INFO [manual-repair-02182a20-5659-4aa0-aab9-2fff430f8a71] 2011-06-30 20:29:51,487 AntiEntropyService.java (line 179) Excluding /10.20.13.80 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
 INFO [manual-repair-6b50f51a-f689-4825-bcb9-bebf68664117] 2011-06-30 20:29:51,487 AntiEntropyService.java (line 179) Excluding /10.20.13.76 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
 INFO [manual-repair-6b50f51a-f689-4825-bcb9-bebf68664117] 2011-06-30 20:29:51,487 AntiEntropyService.java (line 179) Excluding /10.20.13.77 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
 INFO [manual-repair-02182a20-5659-4aa0-aab9-2fff430f8a71] 2011-06-30 20:29:51,487 AntiEntropyService.java (line 179) Excluding /10.20.13.76 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
 INFO [manual-repair-c46dd589-ed22-4b7a-809c-d97c094d2354] 2011-06-30 20:29:51,487 AntiEntropyService.java (line 179) Excluding /10.20.13.80 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
 INFO [manual-repair-c46dd589-ed22-4b7a-809c-d97c094d2354] 2011-06-30 20:29:51,488 AntiEntropyService.java (line 179) Excluding /10.20.13.79 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
 INFO [manual-repair-02182a20-5659-4aa0-aab9-2fff430f8a71] 2011-06-30 20:29:51,488 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (141784319550391026443072753096570088105,170141183460469231731687303715884105727]: manual-repair-02182a20-5659-4aa0-aab9-2fff430f8a71 completed.
 INFO [manual-repair-6b50f51a-f689-4825-bcb9-bebf68664117] 2011-06-30 20:29:51,487 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (170141183460469231731687303715884105727,28356863910078205288614550619314017621]: manual-repair-6b50f51a-f689-4825-bcb9-bebf68664117 completed.
 INFO [manual-repair-c46dd589-ed22-4b7a-809c-d97c094d2354] 2011-06-30 20:29:51,488 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (113427455640312821154458202477256070484,141784319550391026443072753096570088105]: manual-repair-c46dd589-ed22-4b7a-809c-d97c094d2354 completed.


All nodes are on 0.8.1

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2768.
---------------------------------------

    Resolution: Cannot Reproduce
      Assignee:     (was: Brandon Williams)

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2768:
-----------------------------------------

    Assignee: Sylvain Lebresne

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Sylvain Lebresne
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2768:
-----------------------------------------

    Assignee: Brandon Williams  (was: Sylvain Lebresne)

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049078#comment-13049078 ] 

Sylvain Lebresne commented on CASSANDRA-2768:
---------------------------------------------

The important part here is that this is not a repair specific thing per se. The important part of the stack trace is the 'Excluding ...' part.
It is triggered because of the following code in AES.getNeighbors:
{noformat}
  if (Gossiper.instance.getVersion(endpoint) <= MessagingService.VERSION_07)
  {
      logger.info("Excluding " + endpoint + " from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.");
      neighbors.remove(endpoint);
  }
{noformat}
Since Sasha has reportedly verified that all node report being on 0.8.0, this suggests a Gossiper bug that reports the wrong version (even after node restarts).

The exception itself has been fixed in CASSANDRA-2767 and should not be the focus of attention here.

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Sylvain Lebresne
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049243#comment-13049243 ] 

Sasha Dolgy commented on CASSANDRA-2768:
----------------------------------------

Hi ... able to give more information now:


cassandra:~$ nodetool ring
Address         Status State   Load            Owns    Token
                                                       170141183460469231731687303715884105726
10.128.103.148  Up     Normal  961.38 KB       11.22%  19095547144942516281182777765338228798
10.128.94.227   Up     Normal  667.56 KB       22.11%  56713727820156410577229101238628035242
10.128.34.18    Up     Normal  688.1 KB        33.33%  113427455640312821154458202477256070484
10.128.90.109   Up     Normal  965.76 KB       33.33%  170141183460469231731687303715884105726

Not a lot of data.  I created a new keyspace with (RF=2), dropped the old one.  Ran repair on the nodes, and now I no longer get the error on some of the nodes. 

I can confirm again all systems are reporting:  ReleaseVersion: 0.8.0  from 'nodetool version'

I am seeing this error on two of the nodes:   

 ERROR [pool-2-thread-14] 2011-06-14 23:33:40,544 CustomTThreadPoolServer.java (line 199) Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213)
        at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [pool-2-thread-16] 2011-06-14 23:33:42,024 CustomTThreadPoolServer.java (line 199) Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213)
        at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

109 and 148 look to be communicating fine.

18 --> 109 (version error)
18 --> 227 (version error)
227 --> 18 (version error)
227 --> 148 (version error)

For my sanity, I checked and confirmed that all four instances are part of the same security group and there are firewall rules allow communication between all four nodes on ports 7000 and 9090

Configuration on all nodes is standard with the following exceptions:


#listen_address: localhost
endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch


> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049326#comment-13049326 ] 

Brandon Williams commented on CASSANDRA-2768:
---------------------------------------------

bq. Since Sasha has reportedly verified that all node report being on 0.8.0, this suggests a Gossiper bug that reports the wrong version (even after node restarts).

Gossiper's setVersion and getVersion are fairly straightforward, and setVersion is called in IncomingTcpConnection and setting it to whatever the remote node said to, so a bug here looks unlikely.  The version information is not persisted anywhere, so the remote node has to be indicating it is 0.7.  I was unable to reproduce following Sasha's steps, so I think the most likely explanation here is that a node is mistakenly still on 0.7.

{quote}
I am seeing this error on two of the nodes:

ERROR [pool-2-thread-14] 2011-06-14 23:33:40,544 CustomTThreadPoolServer.java (line 199) Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
{quote}

This is indicative of a client-side thrift compatibility problem and is unrelated, as thrift is not used for internode communication.

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049214#comment-13049214 ] 

Sasha Dolgy commented on CASSANDRA-2768:
----------------------------------------

I'd like to add that we are also using Ec2Snitch across the four nodes, although, all four nodes are based in the EC2 region of Singapore.  

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Sylvain Lebresne
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049411#comment-13049411 ] 

Sasha Dolgy commented on CASSANDRA-2768:
----------------------------------------

I can assure you they are all on 0.8.0 if the output of "nodetool version" is correct

cassandra@ip-10-128-34-18:~$ nodetool -h 10.128.34.18 -p 9090 ring
Address         Status State   Load            Owns    Token
                                                       170141183460469231731687303715884105726
10.128.103.148  Up     Normal  1.02 MB         11.22%  19095547144942516281182777765338228798
10.128.94.227   Up     Normal  667.56 KB       22.11%  56713727820156410577229101238628035242
10.128.34.18    Up     Normal  688.1 KB        33.33%  113427455640312821154458202477256070484
10.128.90.109   Up     Normal  1.11 MB         33.33%  170141183460469231731687303715884105726
cassandra@ip-10-128-34-18:~$

cassandra@ip-10-128-34-18:~$ nodetool -h 10.128.34.18 -p 9090 version
ReleaseVersion: 0.8.0
cassandra@ip-10-128-34-18:~$


cassandra@ip-10-128-94-227:~$ nodetool -h 10.128.94.227 -p 9090 version
ReleaseVersion: 0.8.0
cassandra@ip-10-128-94-227:~$


cassandra@ip-10-128-90-109:~$ nodetool -h 10.128.90.109 -p 9090 version
ReleaseVersion: 0.8.0
cassandra@ip-10-128-90-109:~$


cassandra@ip-10-128-103-148:~$ nodetool -h 10.128.103.148 -p 9090 version
ReleaseVersion: 0.8.0
cassandra@ip-10-128-103-148:~$


> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node was shut down, new version was turned on, using the existing data files / directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>
> When I run nodetool repair on any of the nodes, the /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to run a nodetool repair again ... the result is the same with the messages being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira