You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joe Obernberger <jo...@gmail.com> on 2022/06/09 15:26:28 UTC
Re: removing a drive - 4.0.1
When a drive fails in a large cluster and you don't immediately have a
replacement drive, is it OK to just remove the drive from cassandra.yaml
and restart the node? Will the missing data (assuming RF=3) be
re-replicated?
I have disk_failure_policy set to "best_effort", but the node still
fails (ie cassandra exits) when a disk (spinning rust) goes bad.
I do have commit_failure_policy set to stop.
Thank you!
-Joe
On 1/7/2022 4:38 PM, Dmitry Saprykin wrote:
> There is a jira ticket describing your situation
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-14793
>
> I may be wrong but is seems that system directories are pinned to
> first data directory in cassandra.yaml by default. When you removed
> first item from the list system data regenerated in the new first
> directory in the list. And then merged??? when original first dir returned
>
> On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger
> <jo...@gmail.com> wrote:
>
> Hi - in order to get the node back up and running I did the following:
> Deleted all data on the node:
> Added: -Dcassandra.replace_address=172.16.100.39
> to the cassandra.env.sh <http://cassandra.env.sh> file, and
> started it up. It is currently bootstrapping.
>
> In cassandra.yaml, say you have the following:
>
> data_file_directories:
> - /data/1/cassandra
> - /data/2/cassandra
> - /data/3/cassandra
> - /data/4/cassandra
> - /data/5/cassandra
> - /data/6/cassandra
> - /data/7/cassandra
> - /data/8/cassandra
>
> If I change the above to:
> # - /data/1/cassandra
> - /data/2/cassandra
> - /data/3/cassandra
> - /data/4/cassandra
> - /data/5/cassandra
> - /data/6/cassandra
> - /data/7/cassandra
> - /data/8/cassandra
>
> the problem happens. If I change it to:
>
> - /data/1/cassandra
> - /data/2/cassandra
> - /data/3/cassandra
> - /data/4/cassandra
> - /data/5/cassandra
> - /data/6/cassandra
> - /data/7/cassandra
> # - /data/8/cassandra
>
> the node starts up OK. I assume it will recover the missing data
> during a repair?
>
> -Joe
>
> On 1/7/2022 4:13 PM, Mano ksio wrote:
>> Hi, you may have already tried, but this may help.
>> https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists
>>
>>
>> can you be little narrate 'If I remove a drive other than the
>> first one'? what does it means
>>
>> On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger
>> <jo...@gmail.com> wrote:
>>
>> Hi All - I have a 13 node cluster running Cassandra 4.0.1.
>> If I stop a
>> node, edit the cassandra.yaml file, comment out the first
>> drive in the
>> list, and restart the node, it fails to start saying that a
>> node already
>> exists in the cluster with the IP address.
>>
>> If I put the drive back into the list, the node still fails
>> to start
>> with the same error. At this point the node is useless and I
>> think the
>> only option is to remove all the data, and re-boostrap it?
>> ---------
>>
>> ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
>> Exception encountered during startup
>> java.lang.RuntimeException: A node with address
>> /172.16.100.39:7000 <http://172.16.100.39:7000>
>> already exists, cancelling join. Use
>> cassandra.replace_address if you
>> want to replace this node.
>> at
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
>> at
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>> at
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>> at
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>> at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>> at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>> at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
>>
>> -----------
>>
>> If I remove a drive other than the first one, this problem
>> doesn't
>> occur. Any other options? It appears that if it the first
>> drive in the
>> list goes bad, or is just removed, that entire node must be
>> replaced.
>>
>> -Joe
>>
>>
>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>> Virus-free. www.avg.com
>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>
>>
>> <#m_3361535422621871688_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
--
This email has been checked for viruses by AVG.
https://www.avg.com