You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Herbert Fischer <he...@crossengage.io> on 2016/01/05 12:01:36 UTC

Node stuck when joining a Cassandra 2.2.0 cluster

We run a small Cassandra 2.2.0 cluster, with 5 nodes, on bare-metal servers
and we are going to replace those nodes with other nodes. I planned to add
all the new nodes first, one-by-one, and later remove the old ones,
one-by-one.

Although the first new node gets stuck when joining the cluster. I tried
two times, left the node from one day to another, and nothing. I also
executed `nodetool repair` in the new node.

All I see is a lot of errors like the following one referring to the old
nodes:

WARN  [MessagingService-Incoming-/10.10.10.10] 2016-01-05 09:34:26,784
IncomingTcpConnection.java:98 - UnknownColumnFamilyException reading from
socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=19509db0-a011-11e5-9acd-73ec538206cc

I also see only one schema version with `nodetool describecluster`.

Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Any idea about what might be wrong?


-- 
Herbert

Re: Node stuck when joining a Cassandra 2.2.0 cluster

Posted by Carlos Alonso <in...@mrcalonso.com>.

Hi Robert.

I'm thinking of upgrading hardware in place. Can you please elaborate a bit
more on how to use the auto_bootstrap=false + hibernate repair technique?

Cheers!

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 6 January 2016 at 11:10, Herbert Fischer <he...@crossengage.io>
wrote:

> Hi,
>
> Thanks for the tip.
>
> I found that one keyspace was kinda corrupted. It was previously
> scrubbed/deleted but there where files left in the servers, so it was in a
> strange state. After removing it from the filesystem I was able to add the
> new node to the cluster. Since this keyspace was in an unknown state, I
> could not find it through the cfId from the error messages.
>
> best
>
> On 5 January 2016 at 22:33, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Tue, Jan 5, 2016 at 3:01 AM, Herbert Fischer <
>> herbert.fischer@crossengage.io> wrote:
>>
>>> We run a small Cassandra 2.2.0 cluster, with 5 nodes, on bare-metal
>>> servers and we are going to replace those nodes with other nodes. I planned
>>> to add all the new nodes first, one-by-one, and later remove the old ones,
>>> one-by-one.
>>>
>>
>> It sounds like your bootstraps are hanging. Your streams should restart
>> after an hour, but probably you want to figure out why they're hanging...
>>
>> You can also use the auto_bootstrap=false+hibernate repair method for
>> this process. That's probably what I'd do if I was upgrading the hardware
>> of nodes in place.
>>
>> =Rob
>>
>>
>
>
>
> --
> Herbert Fischer | Senior IT Architect
> CrossEngage GmbH (haftungsbeschränkt) | Julie-Wolfthorn-Straße 1 | 10115
> Berlin
>
> E-Mail: herbert.fischer@crossengage.io
> Web: www.crossengage.io
>
> Amtsgericht Berlin-Charlottenburg | HRB 169537 B
> Geschäftsführer: Dr. Markus Wübben, Manuel Hinz | USt-IdNr.: DE301504202
>

Re: Node stuck when joining a Cassandra 2.2.0 cluster

Posted by Herbert Fischer <he...@crossengage.io>.

Hi,

Thanks for the tip.

I found that one keyspace was kinda corrupted. It was previously
scrubbed/deleted but there where files left in the servers, so it was in a
strange state. After removing it from the filesystem I was able to add the
new node to the cluster. Since this keyspace was in an unknown state, I
could not find it through the cfId from the error messages.

best

On 5 January 2016 at 22:33, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Jan 5, 2016 at 3:01 AM, Herbert Fischer <
> herbert.fischer@crossengage.io> wrote:
>
>> We run a small Cassandra 2.2.0 cluster, with 5 nodes, on bare-metal
>> servers and we are going to replace those nodes with other nodes. I planned
>> to add all the new nodes first, one-by-one, and later remove the old ones,
>> one-by-one.
>>
>
> It sounds like your bootstraps are hanging. Your streams should restart
> after an hour, but probably you want to figure out why they're hanging...
>
> You can also use the auto_bootstrap=false+hibernate repair method for this
> process. That's probably what I'd do if I was upgrading the hardware of
> nodes in place.
>
> =Rob
>
>

-- 
Herbert Fischer | Senior IT Architect
CrossEngage GmbH (haftungsbeschränkt) | Julie-Wolfthorn-Straße 1 | 10115
Berlin

E-Mail: herbert.fischer@crossengage.io
Web: www.crossengage.io

Amtsgericht Berlin-Charlottenburg | HRB 169537 B
Geschäftsführer: Dr. Markus Wübben, Manuel Hinz | USt-IdNr.: DE301504202

Re: Node stuck when joining a Cassandra 2.2.0 cluster

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Jan 5, 2016 at 3:01 AM, Herbert Fischer <
herbert.fischer@crossengage.io> wrote:

> We run a small Cassandra 2.2.0 cluster, with 5 nodes, on bare-metal
> servers and we are going to replace those nodes with other nodes. I planned
> to add all the new nodes first, one-by-one, and later remove the old ones,
> one-by-one.
>

It sounds like your bootstraps are hanging. Your streams should restart
after an hour, but probably you want to figure out why they're hanging...

You can also use the auto_bootstrap=false+hibernate repair method for this
process. That's probably what I'd do if I was upgrading the hardware of
nodes in place.

=Rob