You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Marcus Eriksson (JIRA)" <ji...@apache.org> on 2011/09/09 14:47:08 UTC

[jira] [Created] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Rolling upgrades from 0.7 to 0.8 not possible
---------------------------------------------

                 Key: CASSANDRA-3166
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.8.4, 0.7.9, 0.7.5
            Reporter: Marcus Eriksson


We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;

1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
3. continue reading/writing
4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.


it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.

Debug logging on the 0.8 node shows:
/var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
/var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .

nothing except for the "newer protocol version..." in the 0.7-logs

i will continue to look at this issue but if anyone has a quick patch, let me know



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101181#comment-13101181 ] 

Marcus Eriksson commented on CASSANDRA-3166:
--------------------------------------------

oh, note that it fails all the way to the client as well, timeouts in hector

> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101264#comment-13101264 ] 

Jonathan Ellis commented on CASSANDRA-3166:
-------------------------------------------

That's right.  This is why when we do get a message from a newer-version host we make sure to add it to gossiper so we connect back to it.

Not sure if that fix got applied to 0.7 -- if not, making the 0.8 node a seed should work around it.

> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102254#comment-13102254 ] 

Marcus Eriksson commented on CASSANDRA-3166:
--------------------------------------------

Minimal patch attached

Clear version in IncomingTcpConnection instead since that is the one setting it;
before we could end up in a state where the outgoing connections got closed, but the incoming one was still up, meaning the version was reset and it was never possible to get the version set again.

Now it is the IncomingTcpConnections responsibility to keep track of versions, if that one is closed, we are bound to get a new incoming connection and therefor set the version correctly



> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>             Fix For: 0.8.4
>
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101327#comment-13101327 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

I'm having difficulty coming up with a clean yet simple fix here. Reverting CASSANDRA-2860 certainly fixes this problem, but re-introduces CASSANDRA-2860 instead.

I could imagine an environment variable/config option to disable the support for pretending you are older than you are, which could be used in a second round of rolling restarts after upgrading all nodes of a cluster to 0.8. A JMX tweakable setting would be nice, but upon changing it you'd want to tear down all the TCP connections to re-initiate versioning negotiation so maybe it's okay to leave it with an extra round of restarts required.

Alternatively, I think (not tested) things will tend to sort itself out incrementally every time you restart a 0.8 node since it will tend to initiate connections to other nodes immediately, but documenting for users that they need to restart nodes all over the place until everyone seems to have gotten it seems like a poor solution.

Adding some new kind of message that says "i really am this other version" or similar isn't clean.

Am I missing a much simpler and cleaner fix here?


> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101286#comment-13101286 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

Sorry, epic fail on my part. Removing the version reset does have an effect, and the 0.7 node is in fact connecting. I made a boo-boo involving running from the wrong working copy...


> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101208#comment-13101208 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

Are you seeing any "Version for $IP is $VERSION" output when running in debug mode? IncomingTcpConnection.run() should be logging that.

> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102257#comment-13102257 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

+1 on my end. That's a very simple solution that I wasn't seeing. Can't figure out a way it will break anything.

* 0.7 <-> 0.7: No version mismatch ever, no reset ever happens. All is well.
* 0.8 <-> 0.8: Same.
* 0.7 <-> 0.8: 0.8 -> 0.7 will be killed (streaming) or retained but messages ignored (messaging). 0.7 -> 0.8 will work, and 0.8 will know the version of 0.7. Future outgoing will use correct version, and the pre-existing messaging connection starts sending messages at a version that isn't ignored.
* 0.7 node restarted and upgraded to 0.8 talking to 0.8: Both incoming/outgoing go down, so version reset, then equivalent of 0.8 <-> 0.8.
* 0.7 node restarted and upgraded to 0.8 talking to 0.7: Both incoming/outgoing go down, so versino reset, then equivalent of 0.7 <-> 0.8.



> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>             Fix For: 0.8.4
>
>         Attachments: 3166.txt
>
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101214#comment-13101214 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

Currently suspecting 6eb154ba5616e0df3ce4f11c88dbb1c92d317465 which adds the version reset on disconnection.


> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101259#comment-13101259 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

Removing the resetVersion() did not help. I added some logging to IncomingTcpConnection and it seems that when the 0.8 node goes up first, the 0.7 node never tries to make an outgoing connection to it.

If my understanding is correct, from reading CASSANDRA-2818 and looking at the code, I think the intent is that we discover the version of the other guy whenever that guy connects to *us*; we can never find out that the other side has a mis-matched version based on activity on the outbound connection.

So, incoming connections would be a necessity in order for the 0.8 node to ever adjust it's lingo.

> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102845#comment-13102845 ] 

Hudson commented on CASSANDRA-3166:
-----------------------------------

Integrated in Cassandra-0.8 #322 (See [https://builds.apache.org/job/Cassandra-0.8/322/])
    Make IncomingTcpConnection responsible for version handling.
Patch by Marcus Erikkson, reviewed by Peter Schuller and brandonwilliams
for CASSANDRA-3166

brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1169823
Files : 
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/OutboundTcpConnection.java


> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>             Fix For: 0.8.6
>
>         Attachments: 3166.txt
>
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102257#comment-13102257 ] 

Peter Schuller edited comment on CASSANDRA-3166 at 9/11/11 10:34 AM:
---------------------------------------------------------------------

+1 on my end. That's a very simple solution that I wasn't seeing. Can't figure out a way it will break anything.

* 0.7 <-> 0.7: No version mismatch ever, no reset ever happens. All is well.
* 0.8 <-> 0.8: Same.
* 0.7 <-> 0.8: 0.8 -> 0.7 will be killed (streaming) or retained but messages ignored (messaging). 0.7 -> 0.8 will work, and 0.8 will know the version of 0.7. Future outgoing will use correct version, and the pre-existing messaging connection starts sending messages at a version that isn't ignored.
* 0.7 node restarted and upgraded to 0.8 talking to 0.8: Both incoming/outgoing go down, so version reset, then equivalent of 0.8 <-> 0.8.
* 0.7 node restarted and upgraded to 0.8 talking to 0.7: Both incoming/outgoing go down, so version reset, then equivalent of 0.7 <-> 0.8.



      was (Author: scode):
    +1 on my end. That's a very simple solution that I wasn't seeing. Can't figure out a way it will break anything.

* 0.7 <-> 0.7: No version mismatch ever, no reset ever happens. All is well.
* 0.8 <-> 0.8: Same.
* 0.7 <-> 0.8: 0.8 -> 0.7 will be killed (streaming) or retained but messages ignored (messaging). 0.7 -> 0.8 will work, and 0.8 will know the version of 0.7. Future outgoing will use correct version, and the pre-existing messaging connection starts sending messages at a version that isn't ignored.
* 0.7 node restarted and upgraded to 0.8 talking to 0.8: Both incoming/outgoing go down, so version reset, then equivalent of 0.8 <-> 0.8.
* 0.7 node restarted and upgraded to 0.8 talking to 0.7: Both incoming/outgoing go down, so versino reset, then equivalent of 0.7 <-> 0.8.


  
> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>             Fix For: 0.8.4
>
>         Attachments: 3166.txt
>
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101223#comment-13101223 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

And that commit was due to CASSANDRA-2818 and CASSANDRA-2860.

> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3166) Rolling upgrades from 0.7 to 0.8 not possible

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-3166:
---------------------------------------

    Attachment: 3166.txt

Make IncomingTcpConnection responsible for version handling

> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>             Fix For: 0.8.4
>
>         Attachments: 3166.txt
>
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me think that it matters if it is the 0.8 node connecting to the 0.7 nodes or the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 11:55:06,067 StorageProxy.java (line 178) Write timeout java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 11:55:06,067 StorageProxy.java (line 584) Read timeout: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira