You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by crypto five <cr...@gmail.com> on 2012/04/24 03:50:07 UTC

Cassandra dying when gets many deletes

Hi,

I have 50 millions of rows in column family on 4G RAM box. I allocatedf 2GB
to cassandra.
I have program which is traversing this CF and cleaning some data there, it
generates about 20k delete statements per second.
After about of 3 millions deletions cassandra stops responding to queries:
it doesn't react to CLI, nodetool etc.
I see in the logs that it tries to free some memory but can't even if I
wait whole day.
Also I see following in  the logs:

INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
2647) Unable to reduce heap usage since there are no dirty column families

When I am looking at memory dump I see that memory goes to
ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

What can I do to make cassandra stop dying?
Why it can't free the memory?
Any ideas?

Thank you.

Re: Cassandra dying when gets many deletes

Posted by Віталій Тимчишин <ti...@gmail.com>.

Thanks a lot. It seems that a fix is commited now and fix will appear in
the next release, so I won't need my own patched cassandra :)

Best regards, Vitalii Tymchyshyn.

2012/5/3 Andrey Kolyadenko <ak...@gmail.com>

> Hi Vitalii,
>
> I sent patch.
>
>
> 2012/4/24 Віталій Тимчишин <ti...@gmail.com>
>
>> Glad you've got it working properly. I've tried to make as "local"
>> changes as possible, so changed only single value calculation. But it's
>> possible your way is better and will be accepted by cassandra maintainer.
>> Could you attach your patch to the ticket. I'd like for any fix to be
>> applied to the trunk since currently I have to make my own patched build
>> each time I upgrade because of the bug.
>>
>> Best regards, Vitalii Tymchyshyn
>>
>> 25 квітня 2012 р. 09:08 crypto five <cr...@gmail.com> написав:
>>
>> I agree with your observations.
>>> From another hand I found that ColumnFamily.size() doesn't calculate
>>> object size correctly. It doesn't count two fields Objects sizes and
>>> returns 0 if there is no object in columns container.
>>> I increased initial size variable value to 24 which is size of two
>>> objects(I didn't now what's correct value), and cassandra started
>>> calculating live ratio correctly, increasing trhouhput value and flushing
>>> memtables.
>>>
>>> On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn <ti...@gmail.com>wrote:
>>>
>>>> **
>>>> Hello.
>>>>
>>>> For me " there are no dirty column families" in your message tells it's
>>>> possibly the same problem.
>>>> The issue is that column families that gets full row deletes only do
>>>> not get ANY SINGLE dirty byte accounted and so can't be picked by flusher.
>>>> Any ratio can't help simply because it is multiplied by 0. Check your
>>>> cfstats.
>>>>
>>>> 24.04.12 09:54, crypto five написав(ла):
>>>>
>>>> Thank you Vitalii.
>>>>
>>>>  Looking at the Jonathan's answer to your patch I think it's probably
>>>> not my case. I see that LiveRatio is calculated in my case, but
>>>> calculations look strange:
>>>>
>>>>  WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
>>>> setting live ratio to maximum of 64 instead of Infinity
>>>>  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
>>>> CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
>>>> (just-counted was 64.0).  calculation took 63355ms for 0 columns
>>>>
>>>>  Looking at the comments in the code: "If it gets higher than 64
>>>> something is probably broken.", looks like it's probably the problem.
>>>> Not sure how to investigate it.
>>>>
>>>> 2012/4/23 Віталій Тимчишин <ti...@gmail.com>
>>>>
>>>>> See https://issues.apache.org/jira/browse/CASSANDRA-3741
>>>>> I did post a fix there that helped me.
>>>>>
>>>>>
>>>>> 2012/4/24 crypto five <cr...@gmail.com>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>  I have 50 millions of rows in column family on 4G RAM box. I
>>>>>> allocatedf 2GB to cassandra.
>>>>>> I have program which is traversing this CF and cleaning some data
>>>>>> there, it generates about 20k delete statements per second.
>>>>>> After about of 3 millions deletions cassandra stops responding to
>>>>>> queries: it doesn't react to CLI, nodetool etc.
>>>>>> I see in the logs that it tries to free some memory but can't even if
>>>>>> I wait whole day.
>>>>>> Also I see following in  the logs:
>>>>>>
>>>>>>  INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java
>>>>>> (line 2647) Unable to reduce heap usage since there are no dirty column
>>>>>> families
>>>>>>
>>>>>>  When I am looking at memory dump I see that memory goes to
>>>>>> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
>>>>>> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
>>>>>> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>>>>>>
>>>>>>  What can I do to make cassandra stop dying?
>>>>>> Why it can't free the memory?
>>>>>> Any ideas?
>>>>>>
>>>>>>  Thank you.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   --
>>>>> Best regards,
>>>>>  Vitalii Tymchyshyn
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>>  Vitalii Tymchyshyn
>>
>
>


-- 
Best regards,
 Vitalii Tymchyshyn

Re: Cassandra dying when gets many deletes

Posted by Віталій Тимчишин <ti...@gmail.com>.

Glad you've got it working properly. I've tried to make as "local" changes
as possible, so changed only single value calculation. But it's possible
your way is better and will be accepted by cassandra maintainer. Could you
attach your patch to the ticket. I'd like for any fix to be applied to the
trunk since currently I have to make my own patched build each time I
upgrade because of the bug.

Best regards, Vitalii Tymchyshyn

25 квітня 2012 р. 09:08 crypto five <cr...@gmail.com> написав:

> I agree with your observations.
> From another hand I found that ColumnFamily.size() doesn't calculate
> object size correctly. It doesn't count two fields Objects sizes and
> returns 0 if there is no object in columns container.
> I increased initial size variable value to 24 which is size of two
> objects(I didn't now what's correct value), and cassandra started
> calculating live ratio correctly, increasing trhouhput value and flushing
> memtables.
>
> On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn <ti...@gmail.com>wrote:
>
>> **
>> Hello.
>>
>> For me " there are no dirty column families" in your message tells it's
>> possibly the same problem.
>> The issue is that column families that gets full row deletes only do not
>> get ANY SINGLE dirty byte accounted and so can't be picked by flusher. Any
>> ratio can't help simply because it is multiplied by 0. Check your cfstats.
>>
>> 24.04.12 09:54, crypto five написав(ла):
>>
>> Thank you Vitalii.
>>
>>  Looking at the Jonathan's answer to your patch I think it's probably
>> not my case. I see that LiveRatio is calculated in my case, but
>> calculations look strange:
>>
>>  WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
>> setting live ratio to maximum of 64 instead of Infinity
>>  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
>> CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
>> (just-counted was 64.0).  calculation took 63355ms for 0 columns
>>
>>  Looking at the comments in the code: "If it gets higher than 64
>> something is probably broken.", looks like it's probably the problem.
>> Not sure how to investigate it.
>>
>> 2012/4/23 Віталій Тимчишин <ti...@gmail.com>
>>
>>> See https://issues.apache.org/jira/browse/CASSANDRA-3741
>>> I did post a fix there that helped me.
>>>
>>>
>>> 2012/4/24 crypto five <cr...@gmail.com>
>>>
>>>> Hi,
>>>>
>>>>  I have 50 millions of rows in column family on 4G RAM box. I
>>>> allocatedf 2GB to cassandra.
>>>> I have program which is traversing this CF and cleaning some data
>>>> there, it generates about 20k delete statements per second.
>>>> After about of 3 millions deletions cassandra stops responding to
>>>> queries: it doesn't react to CLI, nodetool etc.
>>>> I see in the logs that it tries to free some memory but can't even if I
>>>> wait whole day.
>>>> Also I see following in  the logs:
>>>>
>>>>  INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java
>>>> (line 2647) Unable to reduce heap usage since there are no dirty column
>>>> families
>>>>
>>>>  When I am looking at memory dump I see that memory goes to
>>>> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
>>>> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
>>>> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>>>>
>>>>  What can I do to make cassandra stop dying?
>>>> Why it can't free the memory?
>>>> Any ideas?
>>>>
>>>>  Thank you.
>>>>
>>>
>>>
>>>
>>>   --
>>> Best regards,
>>>  Vitalii Tymchyshyn
>>>
>>
>>
>>
>


-- 
Best regards,
 Vitalii Tymchyshyn

Re: Cassandra dying when gets many deletes

Posted by crypto five <cr...@gmail.com>.

I agree with your observations.
>From another hand I found that ColumnFamily.size() doesn't calculate object
size correctly. It doesn't count two fields Objects sizes and returns 0 if
there is no object in columns container.
I increased initial size variable value to 24 which is size of two
objects(I didn't now what's correct value), and cassandra started
calculating live ratio correctly, increasing trhouhput value and flushing
memtables.

On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn <ti...@gmail.com>wrote:

> **
> Hello.
>
> For me " there are no dirty column families" in your message tells it's
> possibly the same problem.
> The issue is that column families that gets full row deletes only do not
> get ANY SINGLE dirty byte accounted and so can't be picked by flusher. Any
> ratio can't help simply because it is multiplied by 0. Check your cfstats.
>
> 24.04.12 09:54, crypto five написав(ла):
>
> Thank you Vitalii.
>
>  Looking at the Jonathan's answer to your patch I think it's probably not
> my case. I see that LiveRatio is calculated in my case, but calculations
> look strange:
>
>  WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
> setting live ratio to maximum of 64 instead of Infinity
>  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
> CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
> (just-counted was 64.0).  calculation took 63355ms for 0 columns
>
>  Looking at the comments in the code: "If it gets higher than 64
> something is probably broken.", looks like it's probably the problem.
> Not sure how to investigate it.
>
> 2012/4/23 Віталій Тимчишин <ti...@gmail.com>
>
>> See https://issues.apache.org/jira/browse/CASSANDRA-3741
>> I did post a fix there that helped me.
>>
>>
>> 2012/4/24 crypto five <cr...@gmail.com>
>>
>>> Hi,
>>>
>>>  I have 50 millions of rows in column family on 4G RAM box. I
>>> allocatedf 2GB to cassandra.
>>> I have program which is traversing this CF and cleaning some data there,
>>> it generates about 20k delete statements per second.
>>> After about of 3 millions deletions cassandra stops responding to
>>> queries: it doesn't react to CLI, nodetool etc.
>>> I see in the logs that it tries to free some memory but can't even if I
>>> wait whole day.
>>> Also I see following in  the logs:
>>>
>>>  INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java
>>> (line 2647) Unable to reduce heap usage since there are no dirty column
>>> families
>>>
>>>  When I am looking at memory dump I see that memory goes to
>>> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
>>> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
>>> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>>>
>>>  What can I do to make cassandra stop dying?
>>> Why it can't free the memory?
>>> Any ideas?
>>>
>>>  Thank you.
>>>
>>
>>
>>
>>   --
>> Best regards,
>>  Vitalii Tymchyshyn
>>
>
>
>

Re: Cassandra dying when gets many deletes

Posted by aaron morton <aa...@thelastpickle.com>.

I've not looked into the CASSANDRA-3721 ticket but…

If you reduce the yaml config setting commitlog_total_space_in_mb you can get similar behaviour to the old memtable_flush_* setting the flushed every CF after X minutes. 

Not pretty but it may work in this case. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/04/2012, at 9:00 PM, Vitalii Tymchyshyn wrote:

> Hello.
> 
> For me " there are no dirty column families" in your message tells it's possibly the same problem.
> The issue is that column families that gets full row deletes only do not get ANY SINGLE dirty byte accounted and so can't be picked by flusher. Any ratio can't help simply because it is multiplied by 0. Check your cfstats.
> 
> 24.04.12 09:54, crypto five написав(ла):
>> 
>> Thank you Vitalii. 
>> 
>> Looking at the Jonathan's answer to your patch I think it's probably not my case. I see that LiveRatio is calculated in my case, but calculations look strange:
>> 
>> WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181) setting live ratio to maximum of 64 instead of Infinity
>>  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186) CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0 (just-counted was 64.0).  calculation took 63355ms for 0 columns
>> 
>> Looking at the comments in the code: "If it gets higher than 64 something is
>>               probably broken.", looks like it's probably the problem.
>> Not sure how to investigate it.
>> 
>> 2012/4/23 Віталій Тимчишин <ti...@gmail.com>
>> See https://issues.apache.org/jira/browse/CASSANDRA-3741
>> I did post a fix there that helped me.
>> 
>> 
>> 2012/4/24 crypto five <cr...@gmail.com>
>> Hi,
>> 
>> I have 50 millions of rows in column family on 4G RAM box. I allocatedf 2GB to cassandra.
>> I have program which is traversing this CF and cleaning some data there, it generates about 20k delete statements per second.
>> After about of 3 millions deletions cassandra stops responding to queries: it doesn't react to CLI, nodetool etc.
>> I see in the logs that it tries to free some memory but can't even if I wait whole day. 
>> Also I see following in  the logs:
>> 
>> INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line 2647) Unable to reduce heap usage since there are no dirty column families
>> 
>> When I am looking at memory dump I see that memory goes to ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%), int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%), ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>> 
>> What can I do to make cassandra stop dying? 
>> Why it can't free the memory?
>> Any ideas?
>> 
>> Thank you.
>> 
>> 
>> 
>> -- 
>> Best regards,
>>  Vitalii Tymchyshyn
>> 
>

Re: Cassandra dying when gets many deletes

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.

Hello.

For me " there are no dirty column families" in your message tells it's 
possibly the same problem.
The issue is that column families that gets full row deletes only do not 
get ANY SINGLE dirty byte accounted and so can't be picked by flusher. 
Any ratio can't help simply because it is multiplied by 0. Check your 
cfstats.

24.04.12 09:54, crypto five написав(ла):
> Thank you Vitalii.
>
> Looking at the Jonathan's answer to your patch I think it's probably 
> not my case. I see that LiveRatio is calculated in my case, but 
> calculations look strange:
>
> WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181) 
> setting live ratio to maximum of 64 instead of Infinity
>  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186) 
> CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0 
> (just-counted was 64.0).  calculation took 63355ms for 0 columns
>
> Looking at the comments in the code: "If it gets higher than 64 
> something is probably broken.", looks like it's probably the problem.
> Not sure how to investigate it.
>
> 2012/4/23 Віталій Тимчишин <tivv00@gmail.com <ma...@gmail.com>>
>
>     See https://issues.apache.org/jira/browse/CASSANDRA-3741
>     I did post a fix there that helped me.
>
>
>     2012/4/24 crypto five <cryptofive@gmail.com
>     <ma...@gmail.com>>
>
>         Hi,
>
>         I have 50 millions of rows in column family on 4G RAM box. I
>         allocatedf 2GB to cassandra.
>         I have program which is traversing this CF and cleaning some
>         data there, it generates about 20k delete statements per second.
>         After about of 3 millions deletions cassandra stops responding
>         to queries: it doesn't react to CLI, nodetool etc.
>         I see in the logs that it tries to free some memory but can't
>         even if I wait whole day.
>         Also I see following in  the logs:
>
>         INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333
>         StorageService.java (line 2647) Unable to reduce heap usage
>         since there are no dirty column families
>
>         When I am looking at memory dump I see that memory goes to
>         ConcurrentSkipListMap(10%), HeapByteBuffer(13%),
>         DecoratedKey(6%), int[](6%), BigInteger(8.2%),
>         ConcurrentSkipListMap$HeadIndex(7.2%), ColumnFamily(6.5%),
>         ThreadSafeSortedColumns(13.7%), long[](5.9%).
>
>         What can I do to make cassandra stop dying?
>         Why it can't free the memory?
>         Any ideas?
>
>         Thank you.
>
>
>
>
>     -- 
>     Best regards,
>      Vitalii Tymchyshyn
>
>

Re: Cassandra dying when gets many deletes

Posted by crypto five <cr...@gmail.com>.

Thank you Vitalii.

Looking at the Jonathan's answer to your patch I think it's probably not my
case. I see that LiveRatio is calculated in my case, but calculations look
strange:

WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
setting live ratio to maximum of 64 instead of Infinity
 INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
(just-counted was 64.0).  calculation took 63355ms for 0 columns

Looking at the comments in the code: "If it gets higher than 64 something
is probably broken.", looks like it's probably the problem.
Not sure how to investigate it.

2012/4/23 ����̦� �������� <ti...@gmail.com>

> See https://issues.apache.org/jira/browse/CASSANDRA-3741
> I did post a fix there that helped me.
>
>
> 2012/4/24 crypto five <cr...@gmail.com>
>
>> Hi,
>>
>> I have 50 millions of rows in column family on 4G RAM box. I allocatedf
>> 2GB to cassandra.
>> I have program which is traversing this CF and cleaning some data there,
>> it generates about 20k delete statements per second.
>> After about of 3 millions deletions cassandra stops responding to
>> queries: it doesn't react to CLI, nodetool etc.
>> I see in the logs that it tries to free some memory but can't even if I
>> wait whole day.
>> Also I see following in  the logs:
>>
>> INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
>> 2647) Unable to reduce heap usage since there are no dirty column families
>>
>> When I am looking at memory dump I see that memory goes to
>> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
>> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
>> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>>
>> What can I do to make cassandra stop dying?
>> Why it can't free the memory?
>> Any ideas?
>>
>> Thank you.
>>
>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>

Re: Cassandra dying when gets many deletes

Posted by Віталій Тимчишин <ti...@gmail.com>.

See https://issues.apache.org/jira/browse/CASSANDRA-3741
I did post a fix there that helped me.

2012/4/24 crypto five <cr...@gmail.com>

> Hi,
>
> I have 50 millions of rows in column family on 4G RAM box. I allocatedf
> 2GB to cassandra.
> I have program which is traversing this CF and cleaning some data there,
> it generates about 20k delete statements per second.
> After about of 3 millions deletions cassandra stops responding to queries:
> it doesn't react to CLI, nodetool etc.
> I see in the logs that it tries to free some memory but can't even if I
> wait whole day.
> Also I see following in  the logs:
>
> INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
> 2647) Unable to reduce heap usage since there are no dirty column families
>
> When I am looking at memory dump I see that memory goes to
> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>
> What can I do to make cassandra stop dying?
> Why it can't free the memory?
> Any ideas?
>
> Thank you.
>



-- 
Best regards,
 Vitalii Tymchyshyn