You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Eunsu Kim <eu...@gmail.com> on 2018/12/24 04:53:35 UTC

Data growth is abnormal

Merry Christmas

The Cassandra cluster I operate consists of two datacenters.

Most data has a TTL of 14 days and stores one data for each data center. (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)

However, for a few days ago, only the datacenter1 disk usage is increasing rapidly.

There is no change in nodetool cleanup on each node of datacenter1.

How does this happen? What can I do?

I would appreciate your advice.

Thank you in advance.




Re: Data growth is abnormal

Posted by Eunsu Kim <eu...@gmail.com>.
I solved this problem with a sub-properties of compaction. (unchecked_tombstone_compaction, tombstone_threshold, tombstone_compaction_interval)

It took time. Eventually, two datacenters were again balanced.

Thank you.

> On 24 Dec 2018, at 3:48 PM, Eunsu Kim <eu...@gmail.com> wrote:
> 
> Oh I’m sorry.
> It is marked as included in 3.11.1.
> It seems to be confused with other comments in the middle.
> However, I am not sure what to do with this page..
> 
>> On 24 Dec 2018, at 3:35 PM, Eunsu Kim <eunsu.bill23@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Thank you for your response.
>> 
>> The patch for the issue page you linked to may be not included in 3.11.3.
>> 
>> If I run repair -pr on all nodes, will both datacenter use the same amount of disk?
>> 
>>> On 24 Dec 2018, at 2:25 PM, Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Seems like this is getting asked more and more, that’s unfortunate. Wish I had time to fix this by making flush smarter or TWCS split old data. But I don’t. 
>>> 
>>> You can search the list archives for more examples, but what’s probably happening is that you have sstables overlapping which prevents TWCS from dropping them when fully expired
>>> 
>>> The overlaps probably come from either probabilistic read repair or speculative retry read-repairing data into the memtable on the dc that coordinates your reads
>>> 
>>> Cassandra-13418 (  https://issues.apache.org/jira/browse/CASSANDRA-13418 <https://issues.apache.org/jira/browse/CASSANDRA-13418> ) makes it so you can force sstables to be dropped at expiration regardless of overlaps, but you have to set some properties because it’s technically unsafe (if you write to the table with anything other than ttls).
>>> 
>>> 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>> On Dec 24, 2018, at 12:05 AM, Eunsu Kim <eunsu.bill23@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>>> I’m using TimeWindowCompactionStrategy.
>>>> 
>>>> All consistency level is ONE.
>>>> 
>>>>> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>> What compaction strategy are you using ?
>>>>> 
>>>>> What consistency level do you use on writes? Reads? 
>>>>> 
>>>>> -- 
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim <eunsu.bill23@gmail.com <ma...@gmail.com>> wrote:
>>>>>> 
>>>>>> Merry Christmas
>>>>>> 
>>>>>> The Cassandra cluster I operate consists of two datacenters.
>>>>>> 
>>>>>> Most data has a TTL of 14 days and stores one data for each data center. (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>>>>>> 
>>>>>> However, for a few days ago, only the datacenter1 disk usage is increasing rapidly.
>>>>>> 
>>>>>> There is no change in nodetool cleanup on each node of datacenter1.
>>>>>> 
>>>>>> How does this happen? What can I do?
>>>>>> 
>>>>>> I would appreciate your advice.
>>>>>> 
>>>>>> Thank you in advance.
>>>>>> 
>>>>>> <PastedGraphic-2.png>
>>>>>> 
>>>>>> <PastedGraphic-1.png>
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> 
>> 
> 


Re: Data growth is abnormal

Posted by Eunsu Kim <eu...@gmail.com>.
Oh I’m sorry.
It is marked as included in 3.11.1.
It seems to be confused with other comments in the middle.
However, I am not sure what to do with this page..

> On 24 Dec 2018, at 3:35 PM, Eunsu Kim <eu...@gmail.com> wrote:
> 
> Thank you for your response.
> 
> The patch for the issue page you linked to may be not included in 3.11.3.
> 
> If I run repair -pr on all nodes, will both datacenter use the same amount of disk?
> 
>> On 24 Dec 2018, at 2:25 PM, Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Seems like this is getting asked more and more, that’s unfortunate. Wish I had time to fix this by making flush smarter or TWCS split old data. But I don’t. 
>> 
>> You can search the list archives for more examples, but what’s probably happening is that you have sstables overlapping which prevents TWCS from dropping them when fully expired
>> 
>> The overlaps probably come from either probabilistic read repair or speculative retry read-repairing data into the memtable on the dc that coordinates your reads
>> 
>> Cassandra-13418 (  https://issues.apache.org/jira/browse/CASSANDRA-13418 <https://issues.apache.org/jira/browse/CASSANDRA-13418> ) makes it so you can force sstables to be dropped at expiration regardless of overlaps, but you have to set some properties because it’s technically unsafe (if you write to the table with anything other than ttls).
>> 
>> 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>> On Dec 24, 2018, at 12:05 AM, Eunsu Kim <eunsu.bill23@gmail.com <ma...@gmail.com>> wrote:
>> 
>>> I’m using TimeWindowCompactionStrategy.
>>> 
>>> All consistency level is ONE.
>>> 
>>>> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> What compaction strategy are you using ?
>>>> 
>>>> What consistency level do you use on writes? Reads? 
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim <eunsu.bill23@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>> Merry Christmas
>>>>> 
>>>>> The Cassandra cluster I operate consists of two datacenters.
>>>>> 
>>>>> Most data has a TTL of 14 days and stores one data for each data center. (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>>>>> 
>>>>> However, for a few days ago, only the datacenter1 disk usage is increasing rapidly.
>>>>> 
>>>>> There is no change in nodetool cleanup on each node of datacenter1.
>>>>> 
>>>>> How does this happen? What can I do?
>>>>> 
>>>>> I would appreciate your advice.
>>>>> 
>>>>> Thank you in advance.
>>>>> 
>>>>> <PastedGraphic-2.png>
>>>>> 
>>>>> <PastedGraphic-1.png>
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>> 
> 


Re: Data growth is abnormal

Posted by Eunsu Kim <eu...@gmail.com>.
Thank you for your response.

The patch for the issue page you linked to may be not included in 3.11.3.

If I run repair -pr on all nodes, will both datacenter use the same amount of disk?

> On 24 Dec 2018, at 2:25 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> Seems like this is getting asked more and more, that’s unfortunate. Wish I had time to fix this by making flush smarter or TWCS split old data. But I don’t. 
> 
> You can search the list archives for more examples, but what’s probably happening is that you have sstables overlapping which prevents TWCS from dropping them when fully expired
> 
> The overlaps probably come from either probabilistic read repair or speculative retry read-repairing data into the memtable on the dc that coordinates your reads
> 
> Cassandra-13418 (  https://issues.apache.org/jira/browse/CASSANDRA-13418 <https://issues.apache.org/jira/browse/CASSANDRA-13418> ) makes it so you can force sstables to be dropped at expiration regardless of overlaps, but you have to set some properties because it’s technically unsafe (if you write to the table with anything other than ttls).
> 
> 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Dec 24, 2018, at 12:05 AM, Eunsu Kim <eunsu.bill23@gmail.com <ma...@gmail.com>> wrote:
> 
>> I’m using TimeWindowCompactionStrategy.
>> 
>> All consistency level is ONE.
>> 
>>> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa <jjirsa@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> What compaction strategy are you using ?
>>> 
>>> What consistency level do you use on writes? Reads? 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim <eunsu.bill23@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Merry Christmas
>>>> 
>>>> The Cassandra cluster I operate consists of two datacenters.
>>>> 
>>>> Most data has a TTL of 14 days and stores one data for each data center. (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>>>> 
>>>> However, for a few days ago, only the datacenter1 disk usage is increasing rapidly.
>>>> 
>>>> There is no change in nodetool cleanup on each node of datacenter1.
>>>> 
>>>> How does this happen? What can I do?
>>>> 
>>>> I would appreciate your advice.
>>>> 
>>>> Thank you in advance.
>>>> 
>>>> <PastedGraphic-2.png>
>>>> 
>>>> <PastedGraphic-1.png>
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>>> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org <ma...@cassandra.apache.org>
>> For additional commands, e-mail: user-help@cassandra.apache.org <ma...@cassandra.apache.org>
>> 


Re: Data growth is abnormal

Posted by Jeff Jirsa <jj...@gmail.com>.
Seems like this is getting asked more and more, that’s unfortunate. Wish I had time to fix this by making flush smarter or TWCS split old data. But I don’t. 

You can search the list archives for more examples, but what’s probably happening is that you have sstables overlapping which prevents TWCS from dropping them when fully expired

The overlaps probably come from either probabilistic read repair or speculative retry read-repairing data into the memtable on the dc that coordinates your reads

Cassandra-13418 (  https://issues.apache.org/jira/browse/CASSANDRA-13418 ) makes it so you can force sstables to be dropped at expiration regardless of overlaps, but you have to set some properties because it’s technically unsafe (if you write to the table with anything other than ttls).



-- 
Jeff Jirsa


> On Dec 24, 2018, at 12:05 AM, Eunsu Kim <eu...@gmail.com> wrote:
> 
> I’m using TimeWindowCompactionStrategy.
> 
> All consistency level is ONE.
> 
>> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>> 
>> What compaction strategy are you using ?
>> 
>> What consistency level do you use on writes? Reads? 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim <eu...@gmail.com> wrote:
>>> 
>>> Merry Christmas
>>> 
>>> The Cassandra cluster I operate consists of two datacenters.
>>> 
>>> Most data has a TTL of 14 days and stores one data for each data center. (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>>> 
>>> However, for a few days ago, only the datacenter1 disk usage is increasing rapidly.
>>> 
>>> There is no change in nodetool cleanup on each node of datacenter1.
>>> 
>>> How does this happen? What can I do?
>>> 
>>> I would appreciate your advice.
>>> 
>>> Thank you in advance.
>>> 
>>> <PastedGraphic-2.png>
>>> 
>>> <PastedGraphic-1.png>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 

Re: Data growth is abnormal

Posted by Eunsu Kim <eu...@gmail.com>.
I’m using TimeWindowCompactionStrategy.

All consistency level is ONE.

> On 24 Dec 2018, at 2:01 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> What compaction strategy are you using ?
> 
> What consistency level do you use on writes? Reads? 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Dec 23, 2018, at 11:53 PM, Eunsu Kim <eu...@gmail.com> wrote:
>> 
>> Merry Christmas
>> 
>> The Cassandra cluster I operate consists of two datacenters.
>> 
>> Most data has a TTL of 14 days and stores one data for each data center. (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
>> 
>> However, for a few days ago, only the datacenter1 disk usage is increasing rapidly.
>> 
>> There is no change in nodetool cleanup on each node of datacenter1.
>> 
>> How does this happen? What can I do?
>> 
>> I would appreciate your advice.
>> 
>> Thank you in advance.
>> 
>> <PastedGraphic-2.png>
>> 
>> <PastedGraphic-1.png>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Data growth is abnormal

Posted by Jeff Jirsa <jj...@gmail.com>.
What compaction strategy are you using ?

What consistency level do you use on writes? Reads? 

-- 
Jeff Jirsa


> On Dec 23, 2018, at 11:53 PM, Eunsu Kim <eu...@gmail.com> wrote:
> 
> Merry Christmas
> 
> The Cassandra cluster I operate consists of two datacenters.
> 
> Most data has a TTL of 14 days and stores one data for each data center. (NetworkTopologyStrategy, datacenter1: 1, datacenter2: 1)
> 
> However, for a few days ago, only the datacenter1 disk usage is increasing rapidly.
> 
> There is no change in nodetool cleanup on each node of datacenter1.
> 
> How does this happen? What can I do?
> 
> I would appreciate your advice.
> 
> Thank you in advance.
> 
> <PastedGraphic-2.png>
> 
> <PastedGraphic-1.png>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org