You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joe Obernberger <jo...@gmail.com> on 2022/06/09 15:26:28 UTC
Re: removing a drive - 4.0.1

When a drive fails in a large cluster and you don't immediately have a 
replacement drive, is it OK to just remove the drive from cassandra.yaml 
and restart the node?  Will the missing data (assuming RF=3) be 
re-replicated?
I have disk_failure_policy set to "best_effort", but the node still 
fails (ie cassandra exits) when a disk (spinning rust) goes bad.
I do have commit_failure_policy set to stop.

Thank you!

-Joe

On 1/7/2022 4:38 PM, Dmitry Saprykin wrote:
> There is a jira ticket describing your situation
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-14793
>
> I may be wrong but is seems that system directories are pinned to 
> first data directory in cassandra.yaml by default. When you removed 
> first item from the list system data regenerated in the new first 
> directory in the list. And then merged??? when original first dir returned
>
> On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger 
> <jo...@gmail.com> wrote:
>
>     Hi - in order to get the node back up and running I did the following:
>     Deleted all data on the node:
>     Added: -Dcassandra.replace_address=172.16.100.39
>     to the cassandra.env.sh <http://cassandra.env.sh> file, and
>     started it up.  It is currently bootstrapping.
>
>     In cassandra.yaml, say you have the following:
>
>     data_file_directories:
>         - /data/1/cassandra
>         - /data/2/cassandra
>         - /data/3/cassandra
>         - /data/4/cassandra
>         - /data/5/cassandra
>         - /data/6/cassandra
>         - /data/7/cassandra
>         - /data/8/cassandra
>
>     If I change the above to:
>     #    - /data/1/cassandra
>         - /data/2/cassandra
>         - /data/3/cassandra
>         - /data/4/cassandra
>         - /data/5/cassandra
>         - /data/6/cassandra
>         - /data/7/cassandra
>         - /data/8/cassandra
>
>     the problem happens.  If I change it to:
>
>         - /data/1/cassandra
>         - /data/2/cassandra
>         - /data/3/cassandra
>         - /data/4/cassandra
>         - /data/5/cassandra
>         - /data/6/cassandra
>         - /data/7/cassandra
>     #    - /data/8/cassandra
>
>     the node starts up OK.  I assume it will recover the missing data
>     during a repair?
>
>     -Joe
>
>     On 1/7/2022 4:13 PM, Mano ksio wrote:
>>     Hi, you may have already tried, but this may help.
>>     https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists
>>
>>
>>     can you be little narrate 'If I remove a drive other than the
>>     first one'? what does it means
>>
>>     On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger
>>     <jo...@gmail.com> wrote:
>>
>>         Hi All - I have a 13 node cluster running Cassandra 4.0.1. 
>>         If I stop a
>>         node, edit the cassandra.yaml file, comment out the first
>>         drive in the
>>         list, and restart the node, it fails to start saying that a
>>         node already
>>         exists in the cluster with the IP address.
>>
>>         If I put the drive back into the list, the node still fails
>>         to start
>>         with the same error.  At this point the node is useless and I
>>         think the
>>         only option is to remove all the data, and re-boostrap it?
>>         ---------
>>
>>         ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
>>         Exception encountered during startup
>>         java.lang.RuntimeException: A node with address
>>         /172.16.100.39:7000 <http://172.16.100.39:7000>
>>         already exists, cancelling join. Use
>>         cassandra.replace_address if you
>>         want to replace this node.
>>                  at
>>         org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
>>                  at
>>         org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>>                  at
>>         org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>>                  at
>>         org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>>                  at
>>         org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>>                  at
>>         org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>>                  at
>>         org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
>>
>>         -----------
>>
>>         If I remove a drive other than the first one, this problem
>>         doesn't
>>         occur.  Any other options?  It appears that if it the first
>>         drive in the
>>         list goes bad, or is just removed, that entire node must be
>>         replaced.
>>
>>         -Joe
>>
>>
>>     <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>     	Virus-free. www.avg.com
>>     <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>
>>
>>     <#m_3361535422621871688_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

-- 
This email has been checked for viruses by AVG.
https://www.avg.com