You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Arya Goudarzi <go...@gmail.com> on 2013/04/03 07:12:44 UTC

Re: Incompatible Gossip 1.1.6 to 1.2.1 Upgrade?

Aaron,

I added -Dcassandra.load_ring_state=false in the cassandra-env.sh and did a
rolling restart With one node in 1.2.3 version and 11 other nodes in
1.1.10, the 1.1.10 nodes saw 1.2.3 node but now the gossip on 1.2.3 only
sees itself.

Cheers,
-Arya


On Thu, Mar 28, 2013 at 1:02 PM, Arya Goudarzi <go...@gmail.com> wrote:

> There has been a little misunderstanding. When all nodes are 1.2.2, they
> are fine. But during the rolling upgrade, 1.2.2 nodes see 1.1.10 nodes as
> down in nodetool command despite gossip reporting NORMAL. I will give your
> suggestion a try and wil report back.
>
>
> On Sat, Mar 23, 2013 at 10:37 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> So all nodes are 1.2 and some are still being marked as down ?
>>
>> I would try a rolling restart with -Dcassandra.load_ring_state=false
>> added as a JVM _OPT in cassandra-env.sh. There is no guarantee it will fix
>> it, but it's a simple thing to try.
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 22/03/2013, at 10:30 AM, Arya Goudarzi <go...@gmail.com> wrote:
>>
>> I took Brandon's suggestion in CASSANDRA-5332 and upgraded to 1.1.10
>> before upgrading to 1.2.2 but the issue with nodetool ring reporting
>> machines as down did not resolve.
>>
>> On Fri, Mar 15, 2013 at 6:35 PM, Arya Goudarzi <go...@gmail.com>wrote:
>>
>>> Thank you very much Aaron. I recall from the logs of this upgraded node
>>> to 1.2.2 reported seeing others as dead. Brandon suggested in
>>> https://issues.apache.org/jira/browse/CASSANDRA-5332 that I should at
>>> least upgrade from 1.1.7. So, I decided to try upgrading to 1.1.10 first
>>> before upgrading to 1.2.2. I am in the middle of troubleshooting some other
>>> issues I had with that upgrade (posted separately), once I am done, I will
>>> give your suggestion a try.
>>>
>>>
>>> On Mon, Mar 11, 2013 at 10:34 PM, aaron morton <aa...@thelastpickle.com>wrote:
>>>
>>>> > Is this just a display bug in nodetool or this upgraded node really
>>>> sees the other ones as dead?
>>>> Is the 1.2.2 node which is see all the others as down processing
>>>> requests ?
>>>> Is it showing the others as down in the log ?
>>>>
>>>> I'm not really sure what's happening. But you can try starting the
>>>> 1.2.2 node with the
>>>>
>>>> -Dcassandra.load_ring_state=false
>>>>
>>>> parameter, append it at the bottom of the cassandra-env.sh file. It
>>>> will force the node to get the ring state from the others.
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Consultant
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 8/03/2013, at 10:24 PM, Arya Goudarzi <go...@gmail.com> wrote:
>>>>
>>>> > OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new
>>>> problems that I had and I posted them in a separate email, this issue still
>>>> exists but now it is only on 1.2.2 node. This means that the nodes running
>>>> 1.1.6 see all other nodes including 1.2.2 as Up. Here is the ring and
>>>> gossip from nodes with 1.1.6 for example. Bold denotes upgraded node:
>>>> >
>>>> > Address         DC          Rack        Status State   Load
>>>>  Effective-Ownership Token
>>>> >
>>>>                      141784319550391026443072753098378663700
>>>> > XX.180.36    us-east     1b          Up     Normal  49.47 GB
>>>>  25.00%              1808575600
>>>> > XX.231.121  us-east     1c          Up     Normal  47.08 GB
>>>>  25.00%              7089215977519551322153637656637080005
>>>> > XX.177.177  us-east     1d          Up     Normal  33.64 GB
>>>>  25.00%              14178431955039102644307275311465584410
>>>> > XX.7.148    us-east     1b          Up     Normal  41.27 GB
>>>>  25.00%              42535295865117307932921825930779602030
>>>> > XX.20.9     us-east     1c          Up     Normal  38.51 GB
>>>>  25.00%              49624511842636859255075463585608106435
>>>> > XX.86.255    us-east     1d          Up     Normal  34.78 GB
>>>>  25.00%              56713727820156410577229101240436610840
>>>> > XX.63.230    us-east     1b          Up     Normal  38.11 GB
>>>>  25.00%              85070591730234615865843651859750628460
>>>> > XX.163.36   us-east     1c          Up     Normal  44.25 GB
>>>>  25.00%              92159807707754167187997289514579132865
>>>> > XX.31.234    us-east     1d          Up     Normal  44.66 GB
>>>>  25.00%              99249023685273718510150927169407637270
>>>> > XX.132.169   us-east     1b          Up     Normal  44.2 GB
>>>> 25.00%              127605887595351923798765477788721654890
>>>> > XX.71.63     us-east     1c          Up     Normal  38.74 GB
>>>>  25.00%              134695103572871475120919115443550159295
>>>> > XX.197.209  us-east     1d          Up     Normal  41.5 GB
>>>> 25.00%              141784319550391026443072753098378663700
>>>> >
>>>> > /XX.71.63
>>>> >   RACK:1c
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.1598705272E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.194.92
>>>> >   STATUS:NORMAL,134695103572871475120919115443550159295
>>>> >   RPC_ADDRESS:XX.194.92
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.86.255
>>>> >   RACK:1d
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:3.734334162E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.6.195
>>>> >   STATUS:NORMAL,56713727820156410577229101240436610840
>>>> >   RPC_ADDRESS:XX.6.195
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.7.148
>>>> >   RACK:1b
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.4316975808E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.47.250
>>>> >   STATUS:NORMAL,42535295865117307932921825930779602030
>>>> >   RPC_ADDRESS:XX.47.250
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.63.230
>>>> >   RACK:1b
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.0918593305E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.89.127
>>>> >   STATUS:NORMAL,85070591730234615865843651859750628460
>>>> >   RPC_ADDRESS:XX.89.127
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.132.169
>>>> >   RACK:1b
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.745883458E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.94.161
>>>> >   STATUS:NORMAL,127605887595351923798765477788721654890
>>>> >   RPC_ADDRESS:XX.94.161
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.180.36
>>>> >   RACK:1b
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:5.311963027E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.123.112
>>>> >   STATUS:NORMAL,1808575600
>>>> >   RPC_ADDRESS:XX.123.112
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.163.36
>>>> >   RACK:1c
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.7516755022E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.163.180
>>>> >   STATUS:NORMAL,92159807707754167187997289514579132865
>>>> >   RPC_ADDRESS:XX.163.180
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.31.234
>>>> >   RACK:1d
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.7954372912E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.192.159
>>>> >   STATUS:NORMAL,99249023685273718510150927169407637270
>>>> >   RPC_ADDRESS:XX.192.159
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.197.209
>>>> >   RACK:1d
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.4558968005E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.66.205
>>>> >   STATUS:NORMAL,141784319550391026443072753098378663700
>>>> >   RPC_ADDRESS:XX.66.205
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.177.177
>>>> >   RACK:1d
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:3.6115572697E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.65.57
>>>> >   STATUS:NORMAL,14178431955039102644307275311465584410
>>>> >   RPC_ADDRESS:XX.65.57
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.20.9
>>>> >   RACK:1c
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   LOAD:4.1352503882E10
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.33.229
>>>> >   STATUS:NORMAL,49624511842636859255075463585608106435
>>>> >   RPC_ADDRESS:XX.33.229
>>>> >   RELEASE_VERSION:1.1.6
>>>> > /XX.231.121
>>>> >   RACK:1c
>>>> >   SCHEMA:09487aa5-3380-33ab-b9a5-bcc8476066b0
>>>> >   X4:9c765678-d058-4d85-a588-638ce10ff984
>>>> >   X3:7
>>>> >   DC:us-east
>>>> >   INTERNAL_IP:XX.223.241
>>>> >   RPC_ADDRESS:XX.223.241
>>>> >   RELEASE_VERSION:1.2.2
>>>> >
>>>> > Now the nodetool on the 1.2.2 node shows all nodes as Down but
>>>> itself. Gossipinfo looks gook though:
>>>> >
>>>> > Datacenter: us-east
>>>> > ==========
>>>> > Replicas: 3
>>>> >
>>>> > Address         Rack        Status State   Load            Owns
>>>>          Token
>>>> >
>>>>          56713727820156410577229101240436610840
>>>> > XX.132.169   1b          Down   Normal  44.2 GB         25.00%
>>>>        127605887595351923798765477788721654890
>>>> > XX.7.148    1b          Down   Normal  41.27 GB        25.00%
>>>>      42535295865117307932921825930779602030
>>>> > XX.180.36    1b          Down   Normal  49.47 GB        25.00%
>>>>        1808575600
>>>> > XX.63.230    1b          Down   Normal  38.11 GB        25.00%
>>>>        85070591730234615865843651859750628460
>>>> > XX.231.121  1c          Up     Normal  47.25 GB        25.00%
>>>>      7089215977519551322153637656637080005
>>>> > XX.71.63     1c          Down   Normal  38.74 GB        25.00%
>>>>        134695103572871475120919115443550159295
>>>> > XX.177.177  1d          Down   Normal  33.64 GB        25.00%
>>>>      14178431955039102644307275311465584410
>>>> > XX.31.234    1d          Down   Normal  44.66 GB        25.00%
>>>>        99249023685273718510150927169407637270
>>>> > XX.20.9     1c          Down   Normal  38.51 GB        25.00%
>>>>      49624511842636859255075463585608106435
>>>> > XX.163.36   1c          Down   Normal  44.25 GB        25.00%
>>>>      92159807707754167187997289514579132865
>>>> > XX.197.209  1d          Down   Normal  41.5 GB         25.00%
>>>>      141784319550391026443072753098378663700
>>>> > XX.86.255    1d          Down   Normal  34.78 GB        25.00%
>>>>        56713727820156410577229101240436610840
>>>> >
>>>> > /XX.71.63
>>>> >   RACK:1c
>>>> >   RPC_ADDRESS:XX.194.92
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.194.92
>>>> >   STATUS:NORMAL,134695103572871475120919115443550159295
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.1598705272E10
>>>> > /XX.86.255
>>>> >   RACK:1d
>>>> >   RPC_ADDRESS:XX.6.195
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.6.195
>>>> >   STATUS:NORMAL,56713727820156410577229101240436610840
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:3.7343205002E10
>>>> > /XX.7.148
>>>> >   RACK:1b
>>>> >   RPC_ADDRESS:XX.47.250
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.47.250
>>>> >   STATUS:NORMAL,42535295865117307932921825930779602030
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.4316975808E10
>>>> > /XX.63.230
>>>> >   RACK:1b
>>>> >   RPC_ADDRESS:XX.89.127
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.89.127
>>>> >   STATUS:NORMAL,85070591730234615865843651859750628460
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.0918456687E10
>>>> > /XX.132.169
>>>> >   RACK:1b
>>>> >   RPC_ADDRESS:XX.94.161
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.94.161
>>>> >   STATUS:NORMAL,127605887595351923798765477788721654890
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.745883458E10
>>>> > /XX.180.36
>>>> >   RACK:1b
>>>> >   RPC_ADDRESS:XX.123.112
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.123.112
>>>> >   STATUS:NORMAL,1808575600
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:5.311963027E10
>>>> > /XX.163.36
>>>> >   RACK:1c
>>>> >   RPC_ADDRESS:XX.163.180
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.163.180
>>>> >   STATUS:NORMAL,92159807707754167187997289514579132865
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.7516755022E10
>>>> > /XX.31.234
>>>> >   RACK:1d
>>>> >   RPC_ADDRESS:XX.192.159
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.192.159
>>>> >   STATUS:NORMAL,99249023685273718510150927169407637270
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.7954372912E10
>>>> > /XX.197.209
>>>> >   RACK:1d
>>>> >   RPC_ADDRESS:XX.66.205
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.66.205
>>>> >   STATUS:NORMAL,141784319550391026443072753098378663700
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.4559013211E10
>>>> > /XX.177.177
>>>> >   RACK:1d
>>>> >   RPC_ADDRESS:XX.65.57
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.65.57
>>>> >   STATUS:NORMAL,14178431955039102644307275311465584410
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:3.6115572697E10
>>>> > /XX.20.9
>>>> >   RACK:1c
>>>> >   RPC_ADDRESS:XX.33.229
>>>> >   RELEASE_VERSION:1.1.6
>>>> >   INTERNAL_IP:XX.33.229
>>>> >   STATUS:NORMAL,49624511842636859255075463585608106435
>>>> >   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> >   DC:us-east
>>>> >   LOAD:4.1352367264E10
>>>> > /XX.231.121
>>>> >   HOST_ID:9c765678-d058-4d85-a588-638ce10ff984
>>>> >   RACK:1c
>>>> >   RPC_ADDRESS:XX.223.241
>>>> >   RELEASE_VERSION:1.2.2
>>>> >   INTERNAL_IP:XX.223.241
>>>> >   STATUS:NORMAL,7089215977519551322153637656637080005
>>>> >   NET_VERSION:7
>>>> >   SCHEMA:8b8948f5-d56f-3a96-8005-b9452e42cd67
>>>> >   SEVERITY:0.0
>>>> >   DC:us-east
>>>> >   LOAD:5.0710624207E10
>>>> >
>>>> > Is this just a display bug in nodetool or this upgraded node really
>>>> sees the other ones as dead?
>>>> >
>>>> > -Arya
>>>> >
>>>> >
>>>> > On Mon, Feb 25, 2013 at 8:10 PM, Arya Goudarzi <go...@gmail.com>
>>>> wrote:
>>>> > No I did not look at nodetool gossipinfo but from the ring on both
>>>> pre-upgrade and post upgrade nodes to 1.2.1, what I observed was the
>>>> described behavior.
>>>> >
>>>> >
>>>> > On Sat, Feb 23, 2013 at 1:26 AM, Michael Kjellman <
>>>> mkjellman@barracuda.com> wrote:
>>>> > This was a bug with 1.2.0 but resolved in 1.2.1. Did you take a
>>>> capture of nodetool gossipinfo and nodetool ring by chance?
>>>> >
>>>> > On Feb 23, 2013, at 12:26 AM, "Arya Goudarzi" <go...@gmail.com>
>>>> wrote:
>>>> >
>>>> > > Hi C* users,
>>>> > >
>>>> > > I just upgrade a 12 node test cluster from 1.1.6 to 1.2.1. What I
>>>> noticed from nodetool ring was that the new upgraded nodes only saw each
>>>> other as Normal and the rest of the cluster which was on 1.1.6 as Down.
>>>> Vise versa was true for the nodes running 1.1.6. They saw each other as
>>>> Normal but the 1.2.1 nodes as down. I don't see a note in upgrade docs that
>>>> this would be an issue. Has anyone else observed this problem?
>>>> > >
>>>> > > In the debug logs I could see messages saying attempting to connect
>>>> to node IP and then saying it is down.
>>>> > >
>>>> > > Cheers,
>>>> > > -Arya
>>>> >
>>>> > Copy, by Barracuda, helps you store, protect, and share all your
>>>> amazing
>>>> >
>>>> > things. Start today: www.copy.com.
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>>
>