You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Arya Goudarzi <go...@gmail.com> on 2013/04/03 07:12:44 UTC
Re: Incompatible Gossip 1.1.6 to 1.2.1 Upgrade?
Aaron,
I added -Dcassandra.load_ring_state=false in the cassandra-env.sh and did a
rolling restart With one node in 1.2.3 version and 11 other nodes in
1.1.10, the 1.1.10 nodes saw 1.2.3 node but now the gossip on 1.2.3 only
sees itself.
Cheers,
-Arya
On Thu, Mar 28, 2013 at 1:02 PM, Arya Goudarzi <go...@gmail.com> wrote:
> There has been a little misunderstanding. When all nodes are 1.2.2, they
> are fine. But during the rolling upgrade, 1.2.2 nodes see 1.1.10 nodes as
> down in nodetool command despite gossip reporting NORMAL. I will give your
> suggestion a try and wil report back.
>
>
> On Sat, Mar 23, 2013 at 10:37 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> So all nodes are 1.2 and some are still being marked as down ?
>>
>> I would try a rolling restart with -Dcassandra.load_ring_state=false
>> added as a JVM _OPT in cassandra-env.sh. There is no guarantee it will fix
>> it, but it's a simple thing to try.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 22/03/2013, at 10:30 AM, Arya Goudarzi <go...@gmail.com> wrote:
>>
>> I took Brandon's suggestion in CASSANDRA-5332 and upgraded to 1.1.10
>> before upgrading to 1.2.2 but the issue with nodetool ring reporting
>> machines as down did not resolve.
>>
>> On Fri, Mar 15, 2013 at 6:35 PM, Arya Goudarzi <go...@gmail.com>wrote:
>>
>>> Thank you very much Aaron. I recall from the logs of this upgraded node
>>> to 1.2.2 reported seeing others as dead. Brandon suggested in
>>> https://issues.apache.org/jira/browse/CASSANDRA-5332 that I should at
>>> least upgrade from 1.1.7. So, I decided to try upgrading to 1.1.10 first
>>> before upgrading to 1.2.2. I am in the middle of troubleshooting some other
>>> issues I had with that upgrade (posted separately), once I am done, I will
>>> give your suggestion a try.
>>>
>>>
>>> On Mon, Mar 11, 2013 at 10:34 PM, aaron morton <aa...@thelastpickle.com>wrote:
>>>
>>>> > Is this just a display bug in nodetool or this upgraded node really
>>>> sees the other ones as dead?
>>>> Is the 1.2.2 node which is see all the others as down processing
>>>> requests ?
>>>> Is it showing the others as down in the log ?
>>>>
>>>> I'm not really sure what's happening. But you can try starting the
>>>> 1.2.2 node with the
>>>>
>>>> -Dcassandra.load_ring_state=false
>>>>
>>>> parameter, append it at the bottom of the cassandra-env.sh file. It
>>>> will force the node to get the ring state from the others.
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Consultant
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 8/03/2013, at 10:24 PM, Arya Goudarzi <go...@gmail.com> wrote:
>>>>
>>>> > OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new
>>>> problems that I had and I posted them in a separate email, this issue still
>>>> exists but now it is only on 1.2.2 node. This means that the nodes running
>>>> 1.1.6 see all other nodes including 1.2.2 as Up. Here is the ring and
>>>> gossip from nodes with 1.1.6 for example. Bold denotes upgraded node:
>>>> >
>>>> > Address DC Rack Status State Load
>>>> Effective-Ownership Token
>>>> >
>>>> 141784319550391026443072753098378663700
>>>> > XX.180.36 us-east 1b Up Normal 49.47 GB
>>>> 25.00% 1808575600
>>>> > XX.231.121 us-east 1c Up Normal 47.08 GB
>>>> 25.00% 7089215977519551322153637656637080005
>>>> > XX.177.177 us-east 1d Up Normal 33.64 GB
>>>> 25.00% 14178431955039102644307275311465584410
>>>> > XX.7.148 us-east 1b Up Normal 41.27 GB
>>>> 25.00% 42535295865117307932921825930779602030
>>>> > XX.20.9 us-east 1c Up Normal 38.51 GB
>>>> 25.00% 49624511842636859255075463585608106435
>>>> > XX.86.255 us-east 1d Up Normal 34.78 GB
>>>> 25.00% 56713727820156410577229101240436610840
>>>> > XX.63.230 us-east 1b Up Normal 38.11 GB
>>>> 25.00% 85070591730234615865843651859750628460
>>>> > XX.163.36 us-east 1c Up Normal 44.25 GB
>>>> 25.00% 92159807707754167187997289514579132865
>>>> > XX.31.234 us-east 1d Up Normal 44.66 GB
>>>> 25.00% 99249023685273718510150927169407637270
>>>> > XX.132.169 us-east 1b Up Normal 44.2 GB
>>>> 25.00% 127605887595351923798765477788721654890
>>>> > XX.71.63 us-east 1c Up Normal 38.74 GB
>>>> 25.00% 134695103572871475120919115443550159295
>>>> > XX.197.209 us-east 1d Up Normal 41.5 GB
>>>> 25.00% 141784319550391026443072753098378663700
>>>> >
>>>> > /XX.71.63
>>>> > RACK:1c
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.1598705272E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.194.92
>>>> > STATUS:NORMAL,134695103572871475120919115443550159295
>>>> > RPC_ADDRESS:XX.194.92
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.86.255
>>>> > RACK:1d
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:3.734334162E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.6.195
>>>> > STATUS:NORMAL,56713727820156410577229101240436610840
>>>> > RPC_ADDRESS:XX.6.195
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.7.148
>>>> > RACK:1b
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.4316975808E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.47.250
>>>> > STATUS:NORMAL,42535295865117307932921825930779602030
>>>> > RPC_ADDRESS:XX.47.250
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.63.230
>>>> > RACK:1b
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.0918593305E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.89.127
>>>> > STATUS:NORMAL,85070591730234615865843651859750628460
>>>> > RPC_ADDRESS:XX.89.127
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.132.169
>>>> > RACK:1b
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.745883458E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.94.161
>>>> > STATUS:NORMAL,127605887595351923798765477788721654890
>>>> > RPC_ADDRESS:XX.94.161
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.180.36
>>>> > RACK:1b
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:5.311963027E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.123.112
>>>> > STATUS:NORMAL,1808575600
>>>> > RPC_ADDRESS:XX.123.112
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.163.36
>>>> > RACK:1c
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.7516755022E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.163.180
>>>> > STATUS:NORMAL,92159807707754167187997289514579132865
>>>> > RPC_ADDRESS:XX.163.180
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.31.234
>>>> > RACK:1d
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.7954372912E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.192.159
>>>> > STATUS:NORMAL,99249023685273718510150927169407637270
>>>> > RPC_ADDRESS:XX.192.159
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.197.209
>>>> > RACK:1d
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.4558968005E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.66.205
>>>> > STATUS:NORMAL,141784319550391026443072753098378663700
>>>> > RPC_ADDRESS:XX.66.205
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.177.177
>>>> > RACK:1d
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:3.6115572697E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.65.57
>>>> > STATUS:NORMAL,14178431955039102644307275311465584410
>>>> > RPC_ADDRESS:XX.65.57
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.20.9
>>>> > RACK:1c
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > LOAD:4.1352503882E10
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.33.229
>>>> > STATUS:NORMAL,49624511842636859255075463585608106435
>>>> > RPC_ADDRESS:XX.33.229
>>>> > RELEASE_VERSION:1.1.6
>>>> > /XX.231.121
>>>> > RACK:1c
>>>> > SCHEMA:09487aa5-3380-33ab-b9a5-bcc8476066b0
>>>> > X4:9c765678-d058-4d85-a588-638ce10ff984
>>>> > X3:7
>>>> > DC:us-east
>>>> > INTERNAL_IP:XX.223.241
>>>> > RPC_ADDRESS:XX.223.241
>>>> > RELEASE_VERSION:1.2.2
>>>> >
>>>> > Now the nodetool on the 1.2.2 node shows all nodes as Down but
>>>> itself. Gossipinfo looks gook though:
>>>> >
>>>> > Datacenter: us-east
>>>> > ==========
>>>> > Replicas: 3
>>>> >
>>>> > Address Rack Status State Load Owns
>>>> Token
>>>> >
>>>> 56713727820156410577229101240436610840
>>>> > XX.132.169 1b Down Normal 44.2 GB 25.00%
>>>> 127605887595351923798765477788721654890
>>>> > XX.7.148 1b Down Normal 41.27 GB 25.00%
>>>> 42535295865117307932921825930779602030
>>>> > XX.180.36 1b Down Normal 49.47 GB 25.00%
>>>> 1808575600
>>>> > XX.63.230 1b Down Normal 38.11 GB 25.00%
>>>> 85070591730234615865843651859750628460
>>>> > XX.231.121 1c Up Normal 47.25 GB 25.00%
>>>> 7089215977519551322153637656637080005
>>>> > XX.71.63 1c Down Normal 38.74 GB 25.00%
>>>> 134695103572871475120919115443550159295
>>>> > XX.177.177 1d Down Normal 33.64 GB 25.00%
>>>> 14178431955039102644307275311465584410
>>>> > XX.31.234 1d Down Normal 44.66 GB 25.00%
>>>> 99249023685273718510150927169407637270
>>>> > XX.20.9 1c Down Normal 38.51 GB 25.00%
>>>> 49624511842636859255075463585608106435
>>>> > XX.163.36 1c Down Normal 44.25 GB 25.00%
>>>> 92159807707754167187997289514579132865
>>>> > XX.197.209 1d Down Normal 41.5 GB 25.00%
>>>> 141784319550391026443072753098378663700
>>>> > XX.86.255 1d Down Normal 34.78 GB 25.00%
>>>> 56713727820156410577229101240436610840
>>>> >
>>>> > /XX.71.63
>>>> > RACK:1c
>>>> > RPC_ADDRESS:XX.194.92
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.194.92
>>>> > STATUS:NORMAL,134695103572871475120919115443550159295
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.1598705272E10
>>>> > /XX.86.255
>>>> > RACK:1d
>>>> > RPC_ADDRESS:XX.6.195
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.6.195
>>>> > STATUS:NORMAL,56713727820156410577229101240436610840
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:3.7343205002E10
>>>> > /XX.7.148
>>>> > RACK:1b
>>>> > RPC_ADDRESS:XX.47.250
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.47.250
>>>> > STATUS:NORMAL,42535295865117307932921825930779602030
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.4316975808E10
>>>> > /XX.63.230
>>>> > RACK:1b
>>>> > RPC_ADDRESS:XX.89.127
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.89.127
>>>> > STATUS:NORMAL,85070591730234615865843651859750628460
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.0918456687E10
>>>> > /XX.132.169
>>>> > RACK:1b
>>>> > RPC_ADDRESS:XX.94.161
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.94.161
>>>> > STATUS:NORMAL,127605887595351923798765477788721654890
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.745883458E10
>>>> > /XX.180.36
>>>> > RACK:1b
>>>> > RPC_ADDRESS:XX.123.112
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.123.112
>>>> > STATUS:NORMAL,1808575600
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:5.311963027E10
>>>> > /XX.163.36
>>>> > RACK:1c
>>>> > RPC_ADDRESS:XX.163.180
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.163.180
>>>> > STATUS:NORMAL,92159807707754167187997289514579132865
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.7516755022E10
>>>> > /XX.31.234
>>>> > RACK:1d
>>>> > RPC_ADDRESS:XX.192.159
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.192.159
>>>> > STATUS:NORMAL,99249023685273718510150927169407637270
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.7954372912E10
>>>> > /XX.197.209
>>>> > RACK:1d
>>>> > RPC_ADDRESS:XX.66.205
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.66.205
>>>> > STATUS:NORMAL,141784319550391026443072753098378663700
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.4559013211E10
>>>> > /XX.177.177
>>>> > RACK:1d
>>>> > RPC_ADDRESS:XX.65.57
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.65.57
>>>> > STATUS:NORMAL,14178431955039102644307275311465584410
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:3.6115572697E10
>>>> > /XX.20.9
>>>> > RACK:1c
>>>> > RPC_ADDRESS:XX.33.229
>>>> > RELEASE_VERSION:1.1.6
>>>> > INTERNAL_IP:XX.33.229
>>>> > STATUS:NORMAL,49624511842636859255075463585608106435
>>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>>>> > DC:us-east
>>>> > LOAD:4.1352367264E10
>>>> > /XX.231.121
>>>> > HOST_ID:9c765678-d058-4d85-a588-638ce10ff984
>>>> > RACK:1c
>>>> > RPC_ADDRESS:XX.223.241
>>>> > RELEASE_VERSION:1.2.2
>>>> > INTERNAL_IP:XX.223.241
>>>> > STATUS:NORMAL,7089215977519551322153637656637080005
>>>> > NET_VERSION:7
>>>> > SCHEMA:8b8948f5-d56f-3a96-8005-b9452e42cd67
>>>> > SEVERITY:0.0
>>>> > DC:us-east
>>>> > LOAD:5.0710624207E10
>>>> >
>>>> > Is this just a display bug in nodetool or this upgraded node really
>>>> sees the other ones as dead?
>>>> >
>>>> > -Arya
>>>> >
>>>> >
>>>> > On Mon, Feb 25, 2013 at 8:10 PM, Arya Goudarzi <go...@gmail.com>
>>>> wrote:
>>>> > No I did not look at nodetool gossipinfo but from the ring on both
>>>> pre-upgrade and post upgrade nodes to 1.2.1, what I observed was the
>>>> described behavior.
>>>> >
>>>> >
>>>> > On Sat, Feb 23, 2013 at 1:26 AM, Michael Kjellman <
>>>> mkjellman@barracuda.com> wrote:
>>>> > This was a bug with 1.2.0 but resolved in 1.2.1. Did you take a
>>>> capture of nodetool gossipinfo and nodetool ring by chance?
>>>> >
>>>> > On Feb 23, 2013, at 12:26 AM, "Arya Goudarzi" <go...@gmail.com>
>>>> wrote:
>>>> >
>>>> > > Hi C* users,
>>>> > >
>>>> > > I just upgrade a 12 node test cluster from 1.1.6 to 1.2.1. What I
>>>> noticed from nodetool ring was that the new upgraded nodes only saw each
>>>> other as Normal and the rest of the cluster which was on 1.1.6 as Down.
>>>> Vise versa was true for the nodes running 1.1.6. They saw each other as
>>>> Normal but the 1.2.1 nodes as down. I don't see a note in upgrade docs that
>>>> this would be an issue. Has anyone else observed this problem?
>>>> > >
>>>> > > In the debug logs I could see messages saying attempting to connect
>>>> to node IP and then saying it is down.
>>>> > >
>>>> > > Cheers,
>>>> > > -Arya
>>>> >
>>>> > Copy, by Barracuda, helps you store, protect, and share all your
>>>> amazing
>>>> >
>>>> > things. Start today: www.copy.com.
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>>
>