You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anuj Wadehra <an...@yahoo.co.in> on 2015/06/06 22:24:40 UTC
Hundreds of sstables after every Repair
Hi,
We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny sstables( <10K) get created. And these sstables never get compacted due to coldness issue. I have raised https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck. Upgrade takes time, testing and planning in Production systems :(
I have observed that even if vnodes are NOT damaged, hundreds of tiny sstables are created during repair for a wide row CF. This is beyond my understanding. If everything is consistent, and for the entire repair process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent for <CF>". Whats the need of creating sstables?
Is there any alternative to regular major compaction to deal with situation?
ThanksAnuj Wadehra
Re: Hundreds of sstables after every Repair
Posted by Anuj Wadehra <an...@yahoo.co.in>.
NTP output attached. Any other comments on the two queries ?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Tue, 9 Jun, 2015 at 10:59 pm
Subject:Re: Hundreds of sstables after every Repair
Yes. We use NTP. We also thought that drift is creating problems. Our NTP Output is as under:
[root@node1 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 237 1024 377 1.199 0.062 0.554
*10.x.x.x 10.x.x.x 2 u 178 1024 377 0.479 -0.350 0.626
[root@node2 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 124 1024 377 0.939 -0.001 0.614
*10.x.x.x 10.x.x.x 2 u 722 1024 377 0.567 -0.241 0.585
[root@node3 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 514 1024 377 0.716 -0.103 1.315
*10.x.x.x 10.x.x.x 2 u 21 1024 377 0.402 -0.262 1.070
***IPs are masked
Thanks
Anuj Wadehra
On Tuesday, 9 June 2015 9:12 PM, Carlos Rolo <ro...@pythian.com> wrote:
Hello,
Do you have your clocks synced across your cluster? Are you using NTP and have it properly configured?
Sometimes clock out of sync can trigger weird behaviour.
Regards,
Carlos Juzarte Rolo
Cassandra Consultant
Pythian - Love your data
rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com
On Tue, Jun 9, 2015 at 5:11 PM, Anuj Wadehra <an...@yahoo.co.in> wrote:
We were facing dropped mutations earlier and we increased flush writers. Now there are no dropped mutations in tpstats. To repair the damaged vnodes / inconsistent data we executed repair -pr on all nodes. Still, we see the same problem.
When we analyze repair logs we see 2 strange things:
1. "Out of sync" ranges for cf which are not being actively being written/updated while the repair is going on. When we repaired all data by repair -pr on all nodes, why out of sync data?
2. For some cf , repair logs shows that all ranges are consistent. Still we get so many sstables created during repair. When everything is in sync , why repair creates tiny sstables to repair data?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Ken Hancock" <ke...@schange.com>
Date:Tue, 9 Jun, 2015 at 8:24 pm
Subject:Re: Hundreds of sstables after every Repair
I think this came up recently in another thread. If you're getting large numbers of SSTables after repairs, that means that your nodes are diverging from the keys that they're supposed to be having. Likely you're dropping mutations. Do a nodetool tpstats on each of your nodes and look at the mutation droppped counters. If you're seeing dropped message, my money you have a non-zero FlushWriter "All time blocked" stat which is causing mutations to be dropped.
On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in> wrote:
Any suggestions or comments on this one?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Sun, 7 Jun, 2015 at 1:54 am
Subject:Hundreds of sstables after every Repair
Hi,
We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny sstables( <10K) get created. And these sstables never get compacted due to coldness issue. I have raised https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck. Upgrade takes time, testing and planning in Production systems :(
I have observed that even if vnodes are NOT damaged, hundreds of tiny sstables are created during repair for a wide row CF. This is beyond my understanding. If everything is consistent, and for the entire repair process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent for <CF>". Whats the need of creating sstables?
Is there any alternative to regular major compaction to deal with situation?
Thanks
Anuj Wadehra
--
Re: Hundreds of sstables after every Repair
Posted by Anuj Wadehra <an...@yahoo.co.in>.
NTP output attached. Any other comments on the two queries ?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Tue, 9 Jun, 2015 at 10:59 pm
Subject:Re: Hundreds of sstables after every Repair
Yes. We use NTP. We also thought that drift is creating problems. Our NTP Output is as under:
[root@node1 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 237 1024 377 1.199 0.062 0.554
*10.x.x.x 10.x.x.x 2 u 178 1024 377 0.479 -0.350 0.626
[root@node2 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 124 1024 377 0.939 -0.001 0.614
*10.x.x.x 10.x.x.x 2 u 722 1024 377 0.567 -0.241 0.585
[root@node3 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 514 1024 377 0.716 -0.103 1.315
*10.x.x.x 10.x.x.x 2 u 21 1024 377 0.402 -0.262 1.070
***IPs are masked
Thanks
Anuj Wadehra
On Tuesday, 9 June 2015 9:12 PM, Carlos Rolo <ro...@pythian.com> wrote:
Hello,
Do you have your clocks synced across your cluster? Are you using NTP and have it properly configured?
Sometimes clock out of sync can trigger weird behaviour.
Regards,
Carlos Juzarte Rolo
Cassandra Consultant
Pythian - Love your data
rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com
On Tue, Jun 9, 2015 at 5:11 PM, Anuj Wadehra <an...@yahoo.co.in> wrote:
We were facing dropped mutations earlier and we increased flush writers. Now there are no dropped mutations in tpstats. To repair the damaged vnodes / inconsistent data we executed repair -pr on all nodes. Still, we see the same problem.
When we analyze repair logs we see 2 strange things:
1. "Out of sync" ranges for cf which are not being actively being written/updated while the repair is going on. When we repaired all data by repair -pr on all nodes, why out of sync data?
2. For some cf , repair logs shows that all ranges are consistent. Still we get so many sstables created during repair. When everything is in sync , why repair creates tiny sstables to repair data?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Ken Hancock" <ke...@schange.com>
Date:Tue, 9 Jun, 2015 at 8:24 pm
Subject:Re: Hundreds of sstables after every Repair
I think this came up recently in another thread. If you're getting large numbers of SSTables after repairs, that means that your nodes are diverging from the keys that they're supposed to be having. Likely you're dropping mutations. Do a nodetool tpstats on each of your nodes and look at the mutation droppped counters. If you're seeing dropped message, my money you have a non-zero FlushWriter "All time blocked" stat which is causing mutations to be dropped.
On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in> wrote:
Any suggestions or comments on this one?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Sun, 7 Jun, 2015 at 1:54 am
Subject:Hundreds of sstables after every Repair
Hi,
We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny sstables( <10K) get created. And these sstables never get compacted due to coldness issue. I have raised https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck. Upgrade takes time, testing and planning in Production systems :(
I have observed that even if vnodes are NOT damaged, hundreds of tiny sstables are created during repair for a wide row CF. This is beyond my understanding. If everything is consistent, and for the entire repair process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent for <CF>". Whats the need of creating sstables?
Is there any alternative to regular major compaction to deal with situation?
Thanks
Anuj Wadehra
--
Re: Hundreds of sstables after every Repair
Posted by Anuj Wadehra <an...@yahoo.co.in>.
NTP output attached. Any other comments on the two queries ?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Tue, 9 Jun, 2015 at 10:59 pm
Subject:Re: Hundreds of sstables after every Repair
Yes. We use NTP. We also thought that drift is creating problems. Our NTP Output is as under:
[root@node1 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 237 1024 377 1.199 0.062 0.554
*10.x.x.x 10.x.x.x 2 u 178 1024 377 0.479 -0.350 0.626
[root@node2 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 124 1024 377 0.939 -0.001 0.614
*10.x.x.x 10.x.x.x 2 u 722 1024 377 0.567 -0.241 0.585
[root@node3 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 514 1024 377 0.716 -0.103 1.315
*10.x.x.x 10.x.x.x 2 u 21 1024 377 0.402 -0.262 1.070
***IPs are masked
Thanks
Anuj Wadehra
On Tuesday, 9 June 2015 9:12 PM, Carlos Rolo <ro...@pythian.com> wrote:
Hello,
Do you have your clocks synced across your cluster? Are you using NTP and have it properly configured?
Sometimes clock out of sync can trigger weird behaviour.
Regards,
Carlos Juzarte Rolo
Cassandra Consultant
Pythian - Love your data
rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com
On Tue, Jun 9, 2015 at 5:11 PM, Anuj Wadehra <an...@yahoo.co.in> wrote:
We were facing dropped mutations earlier and we increased flush writers. Now there are no dropped mutations in tpstats. To repair the damaged vnodes / inconsistent data we executed repair -pr on all nodes. Still, we see the same problem.
When we analyze repair logs we see 2 strange things:
1. "Out of sync" ranges for cf which are not being actively being written/updated while the repair is going on. When we repaired all data by repair -pr on all nodes, why out of sync data?
2. For some cf , repair logs shows that all ranges are consistent. Still we get so many sstables created during repair. When everything is in sync , why repair creates tiny sstables to repair data?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Ken Hancock" <ke...@schange.com>
Date:Tue, 9 Jun, 2015 at 8:24 pm
Subject:Re: Hundreds of sstables after every Repair
I think this came up recently in another thread. If you're getting large numbers of SSTables after repairs, that means that your nodes are diverging from the keys that they're supposed to be having. Likely you're dropping mutations. Do a nodetool tpstats on each of your nodes and look at the mutation droppped counters. If you're seeing dropped message, my money you have a non-zero FlushWriter "All time blocked" stat which is causing mutations to be dropped.
On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in> wrote:
Any suggestions or comments on this one?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Sun, 7 Jun, 2015 at 1:54 am
Subject:Hundreds of sstables after every Repair
Hi,
We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny sstables( <10K) get created. And these sstables never get compacted due to coldness issue. I have raised https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck. Upgrade takes time, testing and planning in Production systems :(
I have observed that even if vnodes are NOT damaged, hundreds of tiny sstables are created during repair for a wide row CF. This is beyond my understanding. If everything is consistent, and for the entire repair process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent for <CF>". Whats the need of creating sstables?
Is there any alternative to regular major compaction to deal with situation?
Thanks
Anuj Wadehra
--
Re: Hundreds of sstables after every Repair
Posted by Anuj Wadehra <an...@yahoo.co.in>.
Yes. We use NTP. We also thought that drift is creating problems. Our NTP Output is as under:
[root@node1 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 237 1024 377 1.199 0.062 0.554
*10.x.x.x 10.x.x.x 2 u 178 1024 377 0.479 -0.350 0.626
[root@node2 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 124 1024 377 0.939 -0.001 0.614
*10.x.x.x 10.x.x.x 2 u 722 1024 377 0.567 -0.241 0.585
[root@node3 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.x.x.x 10.x.x.x 2 u 514 1024 377 0.716 -0.103 1.315
*10.x.x.x 10.x.x.x 2 u 21 1024 377 0.402 -0.262 1.070
***IPs are masked
ThanksAnuj Wadehra
On Tuesday, 9 June 2015 9:12 PM, Carlos Rolo <ro...@pythian.com> wrote:
Hello,
Do you have your clocks synced across your cluster? Are you using NTP and have it properly configured?
Sometimes clock out of sync can trigger weird behaviour.
Regards,
Carlos Juzarte RoloCassandra Consultant Pythian - Love your data
rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarteroloMobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649www.pythian.com
On Tue, Jun 9, 2015 at 5:11 PM, Anuj Wadehra <an...@yahoo.co.in> wrote:
| We were facing dropped mutations earlier and we increased flush writers. Now there are no dropped mutations in tpstats. To repair the damaged vnodes / inconsistent data we executed repair -pr on all nodes. Still, we see the same problem.
When we analyze repair logs we see 2 strange things:
1. "Out of sync" ranges for cf which are not being actively being written/updated while the repair is going on. When we repaired all data by repair -pr on all nodes, why out of sync data?
2. For some cf , repair logs shows that all ranges are consistent. Still we get so many sstables created during repair. When everything is in sync , why repair creates tiny sstables to repair data?
ThanksAnuj Wadehra
Sent from Yahoo Mail on Android
| From:"Ken Hancock" <ke...@schange.com>
Date:Tue, 9 Jun, 2015 at 8:24 pm
Subject:Re: Hundreds of sstables after every Repair
I think this came up recently in another thread. If you're getting large numbers of SSTables after repairs, that means that your nodes are diverging from the keys that they're supposed to be having. Likely you're dropping mutations. Do a nodetool tpstats on each of your nodes and look at the mutation droppped counters. If you're seeing dropped message, my money you have a non-zero FlushWriter "All time blocked" stat which is causing mutations to be dropped.
On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in> wrote:
| Any suggestions or comments on this one?
ThanksAnuj Wadehra
Sent from Yahoo Mail on Android
| From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Sun, 7 Jun, 2015 at 1:54 am
Subject:Hundreds of sstables after every Repair
Hi,
We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny sstables( <10K) get created. And these sstables never get compacted due to coldness issue. I have raised https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck. Upgrade takes time, testing and planning in Production systems :(
I have observed that even if vnodes are NOT damaged, hundreds of tiny sstables are created during repair for a wide row CF. This is beyond my understanding. If everything is consistent, and for the entire repair process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent for <CF>". Whats the need of creating sstables?
Is there any alternative to regular major compaction to deal with situation?
ThanksAnuj Wadehra
|
|
|
|
|
|
|
|
|
|
|
--
Re: Hundreds of sstables after every Repair
Posted by Carlos Rolo <ro...@pythian.com>.
Hello,
Do you have your clocks synced across your cluster? Are you using NTP and
have it properly configured?
Sometimes clock out of sync can trigger weird behaviour.
Regards,
Carlos Juzarte Rolo
Cassandra Consultant
Pythian - Love your data
rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com
On Tue, Jun 9, 2015 at 5:11 PM, Anuj Wadehra <an...@yahoo.co.in> wrote:
> We were facing dropped mutations earlier and we increased flush writers.
> Now there are no dropped mutations in tpstats. To repair the damaged vnodes
> / inconsistent data we executed repair -pr on all nodes. Still, we see the
> same problem.
>
> When we analyze repair logs we see 2 strange things:
>
> 1. "Out of sync" ranges for cf which are not being actively being
> written/updated while the repair is going on. When we repaired all data by
> repair -pr on all nodes, why out of sync data?
>
> 2. For some cf , repair logs shows that all ranges are consistent. Still
> we get so many sstables created during repair. When everything is in sync ,
> why repair creates tiny sstables to repair data?
>
> Thanks
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> ------------------------------
> *From*:"Ken Hancock" <ke...@schange.com>
> *Date*:Tue, 9 Jun, 2015 at 8:24 pm
> *Subject*:Re: Hundreds of sstables after every Repair
>
> I think this came up recently in another thread. If you're getting large
> numbers of SSTables after repairs, that means that your nodes are diverging
> from the keys that they're supposed to be having. Likely you're dropping
> mutations. Do a nodetool tpstats on each of your nodes and look at the
> mutation droppped counters. If you're seeing dropped message, my money you
> have a non-zero FlushWriter "All time blocked" stat which is causing
> mutations to be dropped.
>
>
>
> On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in>
> wrote:
>
>> Any suggestions or comments on this one?
>>
>> Thanks
>> Anuj Wadehra
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>> ------------------------------
>> *From*:"Anuj Wadehra" <an...@yahoo.co.in>
>> *Date*:Sun, 7 Jun, 2015 at 1:54 am
>> *Subject*:Hundreds of sstables after every Repair
>>
>> Hi,
>>
>> We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny
>> sstables( <10K) get created. And these sstables never get compacted due to
>> coldness issue. I have raised
>> https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but
>> I have been told to upgrade. Till we upgrade to latest 2.0.x , we are
>> stuck. Upgrade takes time, testing and planning in Production systems :(
>>
>> I have observed that even if vnodes are NOT damaged, hundreds of tiny
>> sstables are created during repair for a wide row CF. This is beyond my
>> understanding. If everything is consistent, and for the entire repair
>> process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent
>> for <CF>". Whats the need of creating sstables?
>>
>> Is there any alternative to regular major compaction to deal with
>> situation?
>>
>>
>> Thanks
>> Anuj Wadehra
>>
>>
>
>
>
>
>
>
--
--
Re: Hundreds of sstables after every Repair
Posted by Ken Hancock <ke...@schange.com>.
Perhaps doing a sstable2json on some of the small tables would shed some
illumination. I was going to suggest the anticompaction feature of C*2.1
(which I'm not familiar with), but you're on 2.0.
On Tue, Jun 9, 2015 at 11:11 AM, Anuj Wadehra <an...@yahoo.co.in>
wrote:
> We were facing dropped mutations earlier and we increased flush writers.
> Now there are no dropped mutations in tpstats. To repair the damaged vnodes
> / inconsistent data we executed repair -pr on all nodes. Still, we see the
> same problem.
>
> When we analyze repair logs we see 2 strange things:
>
> 1. "Out of sync" ranges for cf which are not being actively being
> written/updated while the repair is going on. When we repaired all data by
> repair -pr on all nodes, why out of sync data?
>
> 2. For some cf , repair logs shows that all ranges are consistent. Still
> we get so many sstables created during repair. When everything is in sync ,
> why repair creates tiny sstables to repair data?
>
> Thanks
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> ------------------------------
> *From*:"Ken Hancock" <ke...@schange.com>
> *Date*:Tue, 9 Jun, 2015 at 8:24 pm
> *Subject*:Re: Hundreds of sstables after every Repair
>
> I think this came up recently in another thread. If you're getting large
> numbers of SSTables after repairs, that means that your nodes are diverging
> from the keys that they're supposed to be having. Likely you're dropping
> mutations. Do a nodetool tpstats on each of your nodes and look at the
> mutation droppped counters. If you're seeing dropped message, my money you
> have a non-zero FlushWriter "All time blocked" stat which is causing
> mutations to be dropped.
>
>
>
> On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in>
> wrote:
>
>> Any suggestions or comments on this one?
>>
>> Thanks
>> Anuj Wadehra
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>> ------------------------------
>> *From*:"Anuj Wadehra" <an...@yahoo.co.in>
>> *Date*:Sun, 7 Jun, 2015 at 1:54 am
>> *Subject*:Hundreds of sstables after every Repair
>>
>> Hi,
>>
>> We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny
>> sstables( <10K) get created. And these sstables never get compacted due to
>> coldness issue. I have raised
>> https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but
>> I have been told to upgrade. Till we upgrade to latest 2.0.x , we are
>> stuck. Upgrade takes time, testing and planning in Production systems :(
>>
>> I have observed that even if vnodes are NOT damaged, hundreds of tiny
>> sstables are created during repair for a wide row CF. This is beyond my
>> understanding. If everything is consistent, and for the entire repair
>> process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent
>> for <CF>". Whats the need of creating sstables?
>>
>> Is there any alternative to regular major compaction to deal with
>> situation?
>>
>>
>> Thanks
>> Anuj Wadehra
>>
>>
>
>
>
>
>
>
Re: Hundreds of sstables after every Repair
Posted by Anuj Wadehra <an...@yahoo.co.in>.
We were facing dropped mutations earlier and we increased flush writers. Now there are no dropped mutations in tpstats. To repair the damaged vnodes / inconsistent data we executed repair -pr on all nodes. Still, we see the same problem.
When we analyze repair logs we see 2 strange things:
1. "Out of sync" ranges for cf which are not being actively being written/updated while the repair is going on. When we repaired all data by repair -pr on all nodes, why out of sync data?
2. For some cf , repair logs shows that all ranges are consistent. Still we get so many sstables created during repair. When everything is in sync , why repair creates tiny sstables to repair data?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Ken Hancock" <ke...@schange.com>
Date:Tue, 9 Jun, 2015 at 8:24 pm
Subject:Re: Hundreds of sstables after every Repair
I think this came up recently in another thread. If you're getting large numbers of SSTables after repairs, that means that your nodes are diverging from the keys that they're supposed to be having. Likely you're dropping mutations. Do a nodetool tpstats on each of your nodes and look at the mutation droppped counters. If you're seeing dropped message, my money you have a non-zero FlushWriter "All time blocked" stat which is causing mutations to be dropped.
On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in> wrote:
Any suggestions or comments on this one?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Sun, 7 Jun, 2015 at 1:54 am
Subject:Hundreds of sstables after every Repair
Hi,
We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny sstables( <10K) get created. And these sstables never get compacted due to coldness issue. I have raised https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck. Upgrade takes time, testing and planning in Production systems :(
I have observed that even if vnodes are NOT damaged, hundreds of tiny sstables are created during repair for a wide row CF. This is beyond my understanding. If everything is consistent, and for the entire repair process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent for <CF>". Whats the need of creating sstables?
Is there any alternative to regular major compaction to deal with situation?
Thanks
Anuj Wadehra
Re: Hundreds of sstables after every Repair
Posted by Ken Hancock <ke...@schange.com>.
I think this came up recently in another thread. If you're getting large
numbers of SSTables after repairs, that means that your nodes are diverging
from the keys that they're supposed to be having. Likely you're dropping
mutations. Do a nodetool tpstats on each of your nodes and look at the
mutation droppped counters. If you're seeing dropped message, my money you
have a non-zero FlushWriter "All time blocked" stat which is causing
mutations to be dropped.
On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra <an...@yahoo.co.in>
wrote:
> Any suggestions or comments on this one?
>
> Thanks
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> ------------------------------
> *From*:"Anuj Wadehra" <an...@yahoo.co.in>
> *Date*:Sun, 7 Jun, 2015 at 1:54 am
> *Subject*:Hundreds of sstables after every Repair
>
> Hi,
>
> We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny
> sstables( <10K) get created. And these sstables never get compacted due to
> coldness issue. I have raised
> https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I
> have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck.
> Upgrade takes time, testing and planning in Production systems :(
>
> I have observed that even if vnodes are NOT damaged, hundreds of tiny
> sstables are created during repair for a wide row CF. This is beyond my
> understanding. If everything is consistent, and for the entire repair
> process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent
> for <CF>". Whats the need of creating sstables?
>
> Is there any alternative to regular major compaction to deal with
> situation?
>
>
> Thanks
> Anuj Wadehra
>
>
Re: Hundreds of sstables after every Repair
Posted by Anuj Wadehra <an...@yahoo.co.in>.
Any suggestions or comments on this one?
Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android
From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Sun, 7 Jun, 2015 at 1:54 am
Subject:Hundreds of sstables after every Repair
Hi,
We are using 2.0.3 and vnodes. After every repair -pr operation 50+ tiny sstables( <10K) get created. And these sstables never get compacted due to coldness issue. I have raised https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck. Upgrade takes time, testing and planning in Production systems :(
I have observed that even if vnodes are NOT damaged, hundreds of tiny sstables are created during repair for a wide row CF. This is beyond my understanding. If everything is consistent, and for the entire repair process Cassandra is saying "Endpoints /x.x.x.x and /x.x.x.y are consistent for <CF>". Whats the need of creating sstables?
Is there any alternative to regular major compaction to deal with situation?
Thanks
Anuj Wadehra