You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Atul Saroha <at...@snapdeal.com> on 2016/09/29 08:39:43 UTC

[cassandra 3.6.] Nodetool Repair + tombstone behaviour

Hi,

We have seen a weird behaviour in cassandra 3.6.
Once our node was went down more than 10 hrs. After that, we had ran
Nodetool repair multiple times. But tombstone are not getting sync properly
over the cluster. On day- today basis, on expiry of every grace period,
deleted records start surfacing again in cassandra.

It seems Nodetool repair in not syncing tomebstone across cluster.
FYI, we have 3 data centres now.

Just want the help how to verify and debug this issue. Help will be
appreciated.

-- 
Regards,
Atul Saroha

*Lead Software Engineer | CAMS*

M: +91 8447784271
Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
Udyog Vihar Phase IV,Gurgaon, Haryana, India

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Atul,

our fork has been tested on 2.1 and 3.0.x clusters.
I've just tested with a CCM 3.6 cluster and it worked with no issue.

With Reaper, if you set incremental to false, it'll perform a full subrange
repair with no anticompaction.
You'll see this message in the logs : INFO  [AntiEntropyStage:1] 2016-09-29
16:11:34,950 ActiveRepairService.java:378 - Not a global repair, will not
do anticompaction

If you set incremental to true, it'll perform an incremental repair, one
node at a time, with anticompaction (set Parallelism to Parallel
exclusively with inc repair).

Let me know how it goes.


On Thu, Sep 29, 2016 at 3:06 PM Atul Saroha <at...@snapdeal.com>
wrote:

> Hi Alexander,
>
> There is compatibility issue raised with spotify/cassandra-reaper for
> cassandra version 3.x.
> Is it comaptible with 3.6 in fork thelastpickle/cassandra-reaper ?
>
> There are some suggestions mentioned by *brstgt* which we can try on our
> side.
>
> On Thu, Sep 29, 2016 at 5:42 PM, Atul Saroha <at...@snapdeal.com>
> wrote:
>
>> Thanks Alexander.
>>
>> Will look into all these.
>>
>> On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>>> Atul,
>>>
>>> since you're using 3.6, by default you're running incremental repair,
>>> which doesn't like concurrency very much.
>>> Validation errors are not occurring on a partition or partition range
>>> base, but if you're trying to run both anticompaction and validation
>>> compaction on the same SSTable.
>>>
>>> Like advised to Robert yesterday, and if you want to keep on running
>>> incremental repair, I'd suggest the following :
>>>
>>>    - run nodetool tpstats on all nodes in search for running/pending
>>>    repair sessions
>>>    - If you have some, and to be sure you will avoid conflicts, roll
>>>    restart your cluster (all nodes)
>>>    - Then, run "nodetool repair" on one node.
>>>    - When repair has finished on this node (track messages in the log
>>>    and nodetool tpstats), check if other nodes are running anticompactions
>>>    - If so, wait until they are over
>>>    - If not, move on to the other node
>>>
>>> You should be able to run concurrent incremental compactions on
>>> different tables if you wish to speed up the complete repair of the
>>> cluster, but do not try to repair the same table/full keyspace from two
>>> nodes at the same time.
>>>
>>> If you do not want to keep on using incremental repair, and fallback to
>>> classic full repair, I think the only way in 3.6 to avoid anticompaction
>>> will be to use subrange repair (Paulo mentioned that in 3.x full repair
>>> also triggers anticompaction).
>>>
>>> You have two options here : cassandra_range_repair (
>>> https://github.com/BrianGallew/cassandra_range_repair) and Spotify
>>> Reaper (https://github.com/spotify/cassandra-reaper)
>>>
>>> cassandra_range_repair might scream about subrange + incremental not
>>> being compatible (not sure here), but you can modify the repair_range()
>>> method by adding a --full switch to the command line used to run repair.
>>>
>>> We have a fork of Reaper that handles both full subrange repair and
>>> incremental repair here :
>>> https://github.com/thelastpickle/cassandra-reaper
>>> It comes with a tweaked version of the UI made by Stephan Podkowinski (
>>> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
>>> interactions to schedule, run and track repair - which adds fields to run
>>> incremental repair (accessible via ...:8080/webui/ in your browser).
>>>
>>> Cheers,
>>>
>>>
>>>
>>> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <at...@snapdeal.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are not sure whether this issue is linked to that node or not. Our
>>>> application does frequent delete and insert.
>>>>
>>>> May be our approach is not correct for nodetool repair. Yes, we
>>>> generally fire repair on all boxes at same time. Till now, it was manual
>>>> with default configuration ( command: "nodetool repair").
>>>> Yes, we saw validation error but that is linked to already running
>>>> repair of  same partition on other box for same partition range. We saw
>>>> error validation failed with some ip as repair in already running for the
>>>> same SSTable.
>>>> Just few days back, we had 2 DCs with 3 nodes each and replication was
>>>> also 3. It means all data on each node.
>>>>
>>>> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
>>>> alex@thelastpickle.com> wrote:
>>>>
>>>>> Hi Atul,
>>>>>
>>>>> could you be more specific on how you are running repair ? What's the
>>>>> precise command line for that, does it run on several nodes at the same
>>>>> time, etc...
>>>>> What is your gc_grace_seconds ?
>>>>> Do you see errors in your logs that would be linked to repairs
>>>>> (Validation failure or failure to create a merkle tree)?
>>>>>
>>>>> You seem to mention a single node that went down but say the whole
>>>>> cluster seem to have zombie data.
>>>>> What is the connection you see between the node that went down and the
>>>>> fact that deleted data comes back to life ?
>>>>> What is your strategy for cyclic maintenance repair (schedule, command
>>>>> line or tool, etc...) ?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <at...@snapdeal.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have seen a weird behaviour in cassandra 3.6.
>>>>>> Once our node was went down more than 10 hrs. After that, we had ran
>>>>>> Nodetool repair multiple times. But tombstone are not getting sync properly
>>>>>> over the cluster. On day- today basis, on expiry of every grace period,
>>>>>> deleted records start surfacing again in cassandra.
>>>>>>
>>>>>> It seems Nodetool repair in not syncing tomebstone across cluster.
>>>>>> FYI, we have 3 data centres now.
>>>>>>
>>>>>> Just want the help how to verify and debug this issue. Help will be
>>>>>> appreciated.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Atul Saroha
>>>>>>
>>>>>> *Lead Software Engineer | CAMS*
>>>>>>
>>>>>> M: +91 8447784271
>>>>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>>>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>>>>
>>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Atul Saroha
>>>>
>>>> *Lead Software Engineer | CAMS*
>>>>
>>>> M: +91 8447784271
>>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>>
>>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Regards,
>> Atul Saroha
>>
>> *Lead Software Engineer | CAMS*
>>
>> M: +91 8447784271
>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>
>>
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Posted by Atul Saroha <at...@snapdeal.com>.

Hi Alexander,

There is compatibility issue raised with spotify/cassandra-reaper for
cassandra version 3.x.
Is it comaptible with 3.6 in fork thelastpickle/cassandra-reaper ?

There are some suggestions mentioned by *brstgt* which we can try on our
side.

On Thu, Sep 29, 2016 at 5:42 PM, Atul Saroha <at...@snapdeal.com>
wrote:

> Thanks Alexander.
>
> Will look into all these.
>
> On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> Atul,
>>
>> since you're using 3.6, by default you're running incremental repair,
>> which doesn't like concurrency very much.
>> Validation errors are not occurring on a partition or partition range
>> base, but if you're trying to run both anticompaction and validation
>> compaction on the same SSTable.
>>
>> Like advised to Robert yesterday, and if you want to keep on running
>> incremental repair, I'd suggest the following :
>>
>>    - run nodetool tpstats on all nodes in search for running/pending
>>    repair sessions
>>    - If you have some, and to be sure you will avoid conflicts, roll
>>    restart your cluster (all nodes)
>>    - Then, run "nodetool repair" on one node.
>>    - When repair has finished on this node (track messages in the log
>>    and nodetool tpstats), check if other nodes are running anticompactions
>>    - If so, wait until they are over
>>    - If not, move on to the other node
>>
>> You should be able to run concurrent incremental compactions on different
>> tables if you wish to speed up the complete repair of the cluster, but do
>> not try to repair the same table/full keyspace from two nodes at the same
>> time.
>>
>> If you do not want to keep on using incremental repair, and fallback to
>> classic full repair, I think the only way in 3.6 to avoid anticompaction
>> will be to use subrange repair (Paulo mentioned that in 3.x full repair
>> also triggers anticompaction).
>>
>> You have two options here : cassandra_range_repair (
>> https://github.com/BrianGallew/cassandra_range_repair) and Spotify
>> Reaper (https://github.com/spotify/cassandra-reaper)
>>
>> cassandra_range_repair might scream about subrange + incremental not
>> being compatible (not sure here), but you can modify the repair_range()
>> method by adding a --full switch to the command line used to run repair.
>>
>> We have a fork of Reaper that handles both full subrange repair and
>> incremental repair here : https://github.com/thelastpi
>> ckle/cassandra-reaper
>> It comes with a tweaked version of the UI made by Stephan Podkowinski (
>> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
>> interactions to schedule, run and track repair - which adds fields to run
>> incremental repair (accessible via ...:8080/webui/ in your browser).
>>
>> Cheers,
>>
>>
>>
>> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <at...@snapdeal.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We are not sure whether this issue is linked to that node or not. Our
>>> application does frequent delete and insert.
>>>
>>> May be our approach is not correct for nodetool repair. Yes, we
>>> generally fire repair on all boxes at same time. Till now, it was manual
>>> with default configuration ( command: "nodetool repair").
>>> Yes, we saw validation error but that is linked to already running
>>> repair of  same partition on other box for same partition range. We saw
>>> error validation failed with some ip as repair in already running for the
>>> same SSTable.
>>> Just few days back, we had 2 DCs with 3 nodes each and replication was
>>> also 3. It means all data on each node.
>>>
>>> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>>> Hi Atul,
>>>>
>>>> could you be more specific on how you are running repair ? What's the
>>>> precise command line for that, does it run on several nodes at the same
>>>> time, etc...
>>>> What is your gc_grace_seconds ?
>>>> Do you see errors in your logs that would be linked to repairs
>>>> (Validation failure or failure to create a merkle tree)?
>>>>
>>>> You seem to mention a single node that went down but say the whole
>>>> cluster seem to have zombie data.
>>>> What is the connection you see between the node that went down and the
>>>> fact that deleted data comes back to life ?
>>>> What is your strategy for cyclic maintenance repair (schedule, command
>>>> line or tool, etc...) ?
>>>>
>>>> Thanks,
>>>>
>>>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <at...@snapdeal.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have seen a weird behaviour in cassandra 3.6.
>>>>> Once our node was went down more than 10 hrs. After that, we had ran
>>>>> Nodetool repair multiple times. But tombstone are not getting sync properly
>>>>> over the cluster. On day- today basis, on expiry of every grace period,
>>>>> deleted records start surfacing again in cassandra.
>>>>>
>>>>> It seems Nodetool repair in not syncing tomebstone across cluster.
>>>>> FYI, we have 3 data centres now.
>>>>>
>>>>> Just want the help how to verify and debug this issue. Help will be
>>>>> appreciated.
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Atul Saroha
>>>>>
>>>>> *Lead Software Engineer | CAMS*
>>>>>
>>>>> M: +91 8447784271
>>>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>>>
>>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Atul Saroha
>>>
>>> *Lead Software Engineer | CAMS*
>>>
>>> M: +91 8447784271
>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>
>>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
>


-- 
Regards,
Atul Saroha

*Lead Software Engineer | CAMS*

M: +91 8447784271
Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
Udyog Vihar Phase IV,Gurgaon, Haryana, India

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Posted by Atul Saroha <at...@snapdeal.com>.

Thanks Alexander.

Will look into all these.

On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> Atul,
>
> since you're using 3.6, by default you're running incremental repair,
> which doesn't like concurrency very much.
> Validation errors are not occurring on a partition or partition range
> base, but if you're trying to run both anticompaction and validation
> compaction on the same SSTable.
>
> Like advised to Robert yesterday, and if you want to keep on running
> incremental repair, I'd suggest the following :
>
>    - run nodetool tpstats on all nodes in search for running/pending
>    repair sessions
>    - If you have some, and to be sure you will avoid conflicts, roll
>    restart your cluster (all nodes)
>    - Then, run "nodetool repair" on one node.
>    - When repair has finished on this node (track messages in the log and
>    nodetool tpstats), check if other nodes are running anticompactions
>    - If so, wait until they are over
>    - If not, move on to the other node
>
> You should be able to run concurrent incremental compactions on different
> tables if you wish to speed up the complete repair of the cluster, but do
> not try to repair the same table/full keyspace from two nodes at the same
> time.
>
> If you do not want to keep on using incremental repair, and fallback to
> classic full repair, I think the only way in 3.6 to avoid anticompaction
> will be to use subrange repair (Paulo mentioned that in 3.x full repair
> also triggers anticompaction).
>
> You have two options here : cassandra_range_repair (https://github.com/
> BrianGallew/cassandra_range_repair) and Spotify Reaper (
> https://github.com/spotify/cassandra-reaper)
>
> cassandra_range_repair might scream about subrange + incremental not being
> compatible (not sure here), but you can modify the repair_range() method
> by adding a --full switch to the command line used to run repair.
>
> We have a fork of Reaper that handles both full subrange repair and
> incremental repair here : https://github.com/
> thelastpickle/cassandra-reaper
> It comes with a tweaked version of the UI made by Stephan Podkowinski (
> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
> interactions to schedule, run and track repair - which adds fields to run
> incremental repair (accessible via ...:8080/webui/ in your browser).
>
> Cheers,
>
>
>
> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <at...@snapdeal.com>
> wrote:
>
>> Hi,
>>
>> We are not sure whether this issue is linked to that node or not. Our
>> application does frequent delete and insert.
>>
>> May be our approach is not correct for nodetool repair. Yes, we generally
>> fire repair on all boxes at same time. Till now, it was manual with default
>> configuration ( command: "nodetool repair").
>> Yes, we saw validation error but that is linked to already running repair
>> of  same partition on other box for same partition range. We saw error
>> validation failed with some ip as repair in already running for the same
>> SSTable.
>> Just few days back, we had 2 DCs with 3 nodes each and replication was
>> also 3. It means all data on each node.
>>
>> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>>> Hi Atul,
>>>
>>> could you be more specific on how you are running repair ? What's the
>>> precise command line for that, does it run on several nodes at the same
>>> time, etc...
>>> What is your gc_grace_seconds ?
>>> Do you see errors in your logs that would be linked to repairs
>>> (Validation failure or failure to create a merkle tree)?
>>>
>>> You seem to mention a single node that went down but say the whole
>>> cluster seem to have zombie data.
>>> What is the connection you see between the node that went down and the
>>> fact that deleted data comes back to life ?
>>> What is your strategy for cyclic maintenance repair (schedule, command
>>> line or tool, etc...) ?
>>>
>>> Thanks,
>>>
>>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <at...@snapdeal.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have seen a weird behaviour in cassandra 3.6.
>>>> Once our node was went down more than 10 hrs. After that, we had ran
>>>> Nodetool repair multiple times. But tombstone are not getting sync properly
>>>> over the cluster. On day- today basis, on expiry of every grace period,
>>>> deleted records start surfacing again in cassandra.
>>>>
>>>> It seems Nodetool repair in not syncing tomebstone across cluster.
>>>> FYI, we have 3 data centres now.
>>>>
>>>> Just want the help how to verify and debug this issue. Help will be
>>>> appreciated.
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Atul Saroha
>>>>
>>>> *Lead Software Engineer | CAMS*
>>>>
>>>> M: +91 8447784271
>>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>>
>>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Regards,
>> Atul Saroha
>>
>> *Lead Software Engineer | CAMS*
>>
>> M: +91 8447784271
>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Regards,
Atul Saroha

*Lead Software Engineer | CAMS*

M: +91 8447784271
Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
Udyog Vihar Phase IV,Gurgaon, Haryana, India

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Atul,

since you're using 3.6, by default you're running incremental repair, which
doesn't like concurrency very much.
Validation errors are not occurring on a partition or partition range base,
but if you're trying to run both anticompaction and validation compaction
on the same SSTable.

Like advised to Robert yesterday, and if you want to keep on running
incremental repair, I'd suggest the following :

   - run nodetool tpstats on all nodes in search for running/pending repair
   sessions
   - If you have some, and to be sure you will avoid conflicts, roll
   restart your cluster (all nodes)
   - Then, run "nodetool repair" on one node.
   - When repair has finished on this node (track messages in the log and
   nodetool tpstats), check if other nodes are running anticompactions
   - If so, wait until they are over
   - If not, move on to the other node

You should be able to run concurrent incremental compactions on different
tables if you wish to speed up the complete repair of the cluster, but do
not try to repair the same table/full keyspace from two nodes at the same
time.

If you do not want to keep on using incremental repair, and fallback to
classic full repair, I think the only way in 3.6 to avoid anticompaction
will be to use subrange repair (Paulo mentioned that in 3.x full repair
also triggers anticompaction).

You have two options here : cassandra_range_repair (
https://github.com/BrianGallew/cassandra_range_repair) and Spotify Reaper (
https://github.com/spotify/cassandra-reaper)

cassandra_range_repair might scream about subrange + incremental not being
compatible (not sure here), but you can modify the repair_range() method by
adding a --full switch to the command line used to run repair.

We have a fork of Reaper that handles both full subrange repair and
incremental repair here : https://github.com/thelastpickle/cassandra-reaper
It comes with a tweaked version of the UI made by Stephan Podkowinski (
https://github.com/spodkowinski/cassandra-reaper-ui) - that eases
interactions to schedule, run and track repair - which adds fields to run
incremental repair (accessible via ...:8080/webui/ in your browser).

Cheers,

On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <at...@snapdeal.com>
wrote:

> Hi,
>
> We are not sure whether this issue is linked to that node or not. Our
> application does frequent delete and insert.
>
> May be our approach is not correct for nodetool repair. Yes, we generally
> fire repair on all boxes at same time. Till now, it was manual with default
> configuration ( command: "nodetool repair").
> Yes, we saw validation error but that is linked to already running repair
> of  same partition on other box for same partition range. We saw error
> validation failed with some ip as repair in already running for the same
> SSTable.
> Just few days back, we had 2 DCs with 3 nodes each and replication was
> also 3. It means all data on each node.
>
> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> Hi Atul,
>>
>> could you be more specific on how you are running repair ? What's the
>> precise command line for that, does it run on several nodes at the same
>> time, etc...
>> What is your gc_grace_seconds ?
>> Do you see errors in your logs that would be linked to repairs
>> (Validation failure or failure to create a merkle tree)?
>>
>> You seem to mention a single node that went down but say the whole
>> cluster seem to have zombie data.
>> What is the connection you see between the node that went down and the
>> fact that deleted data comes back to life ?
>> What is your strategy for cyclic maintenance repair (schedule, command
>> line or tool, etc...) ?
>>
>> Thanks,
>>
>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <at...@snapdeal.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We have seen a weird behaviour in cassandra 3.6.
>>> Once our node was went down more than 10 hrs. After that, we had ran
>>> Nodetool repair multiple times. But tombstone are not getting sync properly
>>> over the cluster. On day- today basis, on expiry of every grace period,
>>> deleted records start surfacing again in cassandra.
>>>
>>> It seems Nodetool repair in not syncing tomebstone across cluster.
>>> FYI, we have 3 data centres now.
>>>
>>> Just want the help how to verify and debug this issue. Help will be
>>> appreciated.
>>>
>>>
>>> --
>>> Regards,
>>> Atul Saroha
>>>
>>> *Lead Software Engineer | CAMS*
>>>
>>> M: +91 8447784271
>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>>
>>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Posted by Atul Saroha <at...@snapdeal.com>.

Hi,

We are not sure whether this issue is linked to that node or not. Our
application does frequent delete and insert.

May be our approach is not correct for nodetool repair. Yes, we generally
fire repair on all boxes at same time. Till now, it was manual with default
configuration ( command: "nodetool repair").
Yes, we saw validation error but that is linked to already running repair
of  same partition on other box for same partition range. We saw error
validation failed with some ip as repair in already running for the same
SSTable.
Just few days back, we had 2 DCs with 3 nodes each and replication was also
3. It means all data on each node.

On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> Hi Atul,
>
> could you be more specific on how you are running repair ? What's the
> precise command line for that, does it run on several nodes at the same
> time, etc...
> What is your gc_grace_seconds ?
> Do you see errors in your logs that would be linked to repairs (Validation
> failure or failure to create a merkle tree)?
>
> You seem to mention a single node that went down but say the whole cluster
> seem to have zombie data.
> What is the connection you see between the node that went down and the
> fact that deleted data comes back to life ?
> What is your strategy for cyclic maintenance repair (schedule, command
> line or tool, etc...) ?
>
> Thanks,
>
> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <at...@snapdeal.com>
> wrote:
>
>> Hi,
>>
>> We have seen a weird behaviour in cassandra 3.6.
>> Once our node was went down more than 10 hrs. After that, we had ran
>> Nodetool repair multiple times. But tombstone are not getting sync properly
>> over the cluster. On day- today basis, on expiry of every grace period,
>> deleted records start surfacing again in cassandra.
>>
>> It seems Nodetool repair in not syncing tomebstone across cluster.
>> FYI, we have 3 data centres now.
>>
>> Just want the help how to verify and debug this issue. Help will be
>> appreciated.
>>
>>
>> --
>> Regards,
>> Atul Saroha
>>
>> *Lead Software Engineer | CAMS*
>>
>> M: +91 8447784271
>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
>> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Regards,
Atul Saroha

*Lead Software Engineer | CAMS*

M: +91 8447784271
Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
Udyog Vihar Phase IV,Gurgaon, Haryana, India

Re: [cassandra 3.6.] Nodetool Repair + tombstone behaviour

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Hi Atul,

could you be more specific on how you are running repair ? What's the
precise command line for that, does it run on several nodes at the same
time, etc...
What is your gc_grace_seconds ?
Do you see errors in your logs that would be linked to repairs (Validation
failure or failure to create a merkle tree)?

You seem to mention a single node that went down but say the whole cluster
seem to have zombie data.
What is the connection you see between the node that went down and the fact
that deleted data comes back to life ?
What is your strategy for cyclic maintenance repair (schedule, command line
or tool, etc...) ?

Thanks,

On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <at...@snapdeal.com>
wrote:

> Hi,
>
> We have seen a weird behaviour in cassandra 3.6.
> Once our node was went down more than 10 hrs. After that, we had ran
> Nodetool repair multiple times. But tombstone are not getting sync properly
> over the cluster. On day- today basis, on expiry of every grace period,
> deleted records start surfacing again in cassandra.
>
> It seems Nodetool repair in not syncing tomebstone across cluster.
> FYI, we have 3 data centres now.
>
> Just want the help how to verify and debug this issue. Help will be
> appreciated.
>
>
> --
> Regards,
> Atul Saroha
>
> *Lead Software Engineer | CAMS*
>
> M: +91 8447784271
> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18,
> Udyog Vihar Phase IV,Gurgaon, Haryana, India
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com