You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by S C <as...@outlook.com> on 2013/01/28 18:29:09 UTC

cluster issues



One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured this pretty quickly, I had few questions that am looking for some answers.
We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory).Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.

On 192.168.2.100Address         DC          Rack        Status State   Load            Owns                Token                                                                                                                                  113427455640312821154458202477256070485     192.168.2.100  Cassandra   rack1       Up     Normal  601.34 MB       33.33%              0                                           192.168.2.101  Analytics   rack1       Down   Normal  149.75 MB       33.33%              56713727820156410577229101238628035242      192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485   
On 192.168.2.101Address         DC          Rack        Status State   Load            Owns                Token                                                                                                                                  113427455640312821154458202477256070485     192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                           192.168.2.101  Analytics   rack1       Up     Normal  158.59 MB       33.33%              56713727820156410577229101238628035242      192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485    
On 192.168.2.102Address         DC          Rack        Status State   Load            Owns                Token                                                                                                                                  113427455640312821154458202477256070485     192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                           192.168.2.101  Analytics   rack1       Down   Normal  ?               33.33%              56713727820156410577229101238628035242      192.168.2.102  Analytics   rack1       Up     Normal  117.02 MB       33.33%              113427455640312821154458202477256070485     

Appreciate your valuable inputs.
Thanks,SC

Re: cluster issues

Posted by aaron morton <aa...@thelastpickle.com>.

For Data Stax Enterprise specific questions try the support forums http://www.datastax.com/support-forums/

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 8:27 AM, S C <as...@outlook.com> wrote:

> I am using DseDelegateSnitch
> 
> Thanks,
> SC
> From: aaron@thelastpickle.com
> Subject: Re: cluster issues
> Date: Tue, 29 Jan 2013 20:15:45 +1300
> To: user@cassandra.apache.org
> 
> 	• We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory)
> There is a way to remove future dated columns, but it not for the faint hearted. 
> 
> Basically:
> 1) Drop the gc_grace_seconds to 0
> 2) Delete the column with a timestamp way in the future, so it is guaranteed to be higher than the value you want to delete. 
> 3) Flush the CF
> 4) Compact all the SSTables that contain the row. The easiest way to do that is a major compaction, but we normally advise not to do that because it creates one big file. You can also do a user defined compaction. 
> 
> 	• Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.
> 	•
> What snitch are you using?
> If you have the property file snitch do all nodes have the same configuration ?
> 
> There is a lot of sickness there. If possible I would scrub and start again. 
> 
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 6:29 AM, S C <as...@outlook.com> wrote:
> 
> One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured this pretty quickly, I had few questions that am looking for some answers.
> 
> 	• We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory).
> 	• Are there any other scenarios that would lead a cluster look like below?Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.
> 
> 
> On 192.168.2.100
> Address         DC          Rack        Status State   Load            Owns                Token                                       
>                                                                                            113427455640312821154458202477256070485     
> 192.168.2.100  Cassandra   rack1       Up     Normal  601.34 MB       33.33%              0                                           
> 192.168.2.101  Analytics   rack1       Down   Normal  149.75 MB       33.33%              56713727820156410577229101238628035242      
> 192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485   
> 
> On 192.168.2.101
> Address         DC          Rack        Status State   Load            Owns                Token                                       
>                                                                                            113427455640312821154458202477256070485     
> 192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                          
> 192.168.2.101  Analytics   rack1       Up     Normal  158.59 MB       33.33%              56713727820156410577229101238628035242      
> 192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485    
> 
> On 192.168.2.102
> Address         DC          Rack        Status State   Load            Owns                Token                                       
>                                                                                            113427455640312821154458202477256070485     
> 192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                          
> 192.168.2.101  Analytics   rack1       Down   Normal  ?               33.33%              56713727820156410577229101238628035242      
> 192.168.2.102  Analytics   rack1       Up     Normal  117.02 MB       33.33%              113427455640312821154458202477256070485     
> 
> 
> Appreciate your valuable inputs.
> 
> Thanks,
> SC

RE: cluster issues

Posted by S C <as...@outlook.com>.

I am using DseDelegateSnitch
Thanks,SC
From: aaron@thelastpickle.com
Subject: Re: cluster issues
Date: Tue, 29 Jan 2013 20:15:45 +1300
To: user@cassandra.apache.org

We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory)There is a way to remove future dated columns, but it not for the faint hearted. 
Basically:1) Drop the gc_grace_seconds to 02) Delete the column with a timestamp way in the future, so it is guaranteed to be higher than the value you want to delete. 3) Flush the CF4) Compact all the SSTables that contain the row. The easiest way to do that is a major compaction, but we normally advise not to do that because it creates one big file. You can also do a user defined compaction. 
Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.What snitch are you using?If you have the property file snitch do all nodes have the same configuration ?
There is a lot of sickness there. If possible I would scrub and start again. 
Cheers 
-----------------Aaron MortonFreelance Cassandra DeveloperNew Zealand
@aaronmortonhttp://www.thelastpickle.com



On 29/01/2013, at 6:29 AM, S C <as...@outlook.com> wrote:





One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured this pretty quickly, I had few questions that am looking for some answers.
We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory).Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.

On 192.168.2.100Address         DC          Rack        Status State   Load            Owns                Token                                                                                                                                  113427455640312821154458202477256070485     192.168.2.100  Cassandra   rack1       Up     Normal  601.34 MB       33.33%              0                                           192.168.2.101  Analytics   rack1       Down   Normal  149.75 MB       33.33%              56713727820156410577229101238628035242      192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485   
On 192.168.2.101Address         DC          Rack        Status State   Load            Owns                Token                                                                                                                                  113427455640312821154458202477256070485     192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                           192.168.2.101  Analytics   rack1       Up     Normal  158.59 MB       33.33%              56713727820156410577229101238628035242      192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485    
On 192.168.2.102Address         DC          Rack        Status State   Load            Owns                Token                                                                                                                                  113427455640312821154458202477256070485     192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                           192.168.2.101  Analytics   rack1       Down   Normal  ?               33.33%              56713727820156410577229101238628035242      192.168.2.102  Analytics   rack1       Up     Normal  117.02 MB       33.33%              113427455640312821154458202477256070485     

Appreciate your valuable inputs.
Thanks,SC

Re: cluster issues

Posted by aaron morton <aa...@thelastpickle.com>.

> We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory)
There is a way to remove future dated columns, but it not for the faint hearted. 

Basically:
1) Drop the gc_grace_seconds to 0
2) Delete the column with a timestamp way in the future, so it is guaranteed to be higher than the value you want to delete. 
3) Flush the CF
4) Compact all the SSTables that contain the row. The easiest way to do that is a major compaction, but we normally advise not to do that because it creates one big file. You can also do a user defined compaction. 

> Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.
What snitch are you using?
If you have the property file snitch do all nodes have the same configuration ?

There is a lot of sickness there. If possible I would scrub and start again. 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 6:29 AM, S C <as...@outlook.com> wrote:

> One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured this pretty quickly, I had few questions that am looking for some answers.
> 
> We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory).
> Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.
> 
> 
> On 192.168.2.100
> Address         DC          Rack        Status State   Load            Owns                Token                                       
>                                                                                            113427455640312821154458202477256070485     
> 192.168.2.100  Cassandra   rack1       Up     Normal  601.34 MB       33.33%              0                                           
> 192.168.2.101  Analytics   rack1       Down   Normal  149.75 MB       33.33%              56713727820156410577229101238628035242      
> 192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485   
> 
> On 192.168.2.101
> Address         DC          Rack        Status State   Load            Owns                Token                                       
>                                                                                            113427455640312821154458202477256070485     
> 192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                           
> 192.168.2.101  Analytics   rack1       Up     Normal  158.59 MB       33.33%              56713727820156410577229101238628035242      
> 192.168.2.102  Analytics   rack1       Down   Normal  ?               33.33%              113427455640312821154458202477256070485    
> 
> On 192.168.2.102
> Address         DC          Rack        Status State   Load            Owns                Token                                       
>                                                                                            113427455640312821154458202477256070485     
> 192.168.2.100  Analytics   rack1       Down   Normal  ?               33.33%              0                                           
> 192.168.2.101  Analytics   rack1       Down   Normal  ?               33.33%              56713727820156410577229101238628035242      
> 192.168.2.102  Analytics   rack1       Up     Normal  117.02 MB       33.33%              113427455640312821154458202477256070485     
> 
> 
> Appreciate your valuable inputs.
> 
> Thanks,
> SC