You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by David Tinker <da...@gmail.com> on 2021/03/01 10:48:29 UTC

Recovery after server crash 4.0b3

Hi Guys

I have a 3 node cluster running 4.0b3 with all data replicated to all 3
nodes. This morning one of the servers started randomly rebooting (up for a
minute or two then reboot) for a couple of hours. The cluster continued
running normally during this time (nice!).

My hosting company has replaced the server and moved the drives across. Is
it safe for me to boot the machine and let it join the cluster?

Thanks
David

Re: Recovery after server crash 4.0b3

Posted by David Tinker <da...@gmail.com>.
Thanks guys. The IP address hasn't changed so I will go ahead and start the
server and repair.

On Mon, Mar 1, 2021 at 1:50 PM Erick Ramirez <er...@datastax.com>
wrote:

> If the node's only been down for less than gc_grace_seconds and the data
> in the drives are intact, you should be fine just booting the server and it
> will join the cluster. You will need to run a repair so it picks up the
> missed mutations.
>
> @Bowen FWIW no need to do a "replace" -- the node will just take over the
> new IP. You'll just see a warning in the system.log that looks like:
>
> Not updating host ID <host_id> for <ip> because it's mine
>
>
> See
> https://github.com/apache/cassandra/blob/cassandra-4.0-beta3/src/java/org/apache/cassandra/service/StorageService.java#L2620.
> Cheers!
>

Re: Recovery after server crash 4.0b3

Posted by Erick Ramirez <er...@datastax.com>.
If the node's only been down for less than gc_grace_seconds and the data in
the drives are intact, you should be fine just booting the server and it
will join the cluster. You will need to run a repair so it picks up the
missed mutations.

@Bowen FWIW no need to do a "replace" -- the node will just take over the
new IP. You'll just see a warning in the system.log that looks like:

Not updating host ID <host_id> for <ip> because it's mine


See
https://github.com/apache/cassandra/blob/cassandra-4.0-beta3/src/java/org/apache/cassandra/service/StorageService.java#L2620.
Cheers!

Re: Recovery after server crash 4.0b3

Posted by Bowen Song <bo...@bso.ng.INVALID>.
Has the IP address changed?

If the IP address hasn't changed and the data is still on disk, you 
should be able to start this node and it will become available again. 
Note: you may need to repair this node after that.

However, if the IP address has changed as the result of replacing the 
server, you will need to replace the dead node 
<https://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node>.


On 01/03/2021 10:48, David Tinker wrote:
> Hi Guys
>
> I have a 3 node cluster running 4.0b3 with all data replicated to all 
> 3 nodes. This morning one of the servers started randomly rebooting 
> (up for a minute or two then reboot) for a couple of hours. The 
> cluster continued running normally during this time (nice!).
>
> My hosting company has replaced the server and moved the drives 
> across. Is it safe for me to boot the machine and let it join the cluster?
>
> Thanks
> David
>