You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Robert Wille <rw...@fold3.com> on 2014/02/01 16:03:55 UTC

Lots of deletions results in death by GC

A few days ago I posted about an issue I¹m having where GC takes a long time
(20-30 seconds), and it happens repeatedly and basically no work gets done.
I¹ve done further investigation, and I now believe that I know the cause. If
I do a lot of deletes, it creates memory pressure until the memtables are
flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
good again (although that takes a very long time because of the GC issue).
If I just leave the flushing to Cassandra, then I end up with death by GC. I
believe that when the memtables are full of tombstones, Cassadnra doesn¹t
realize how much memory the memtables are actually taking up, and so it
doesn¹t proactively flush them in order to free up heap.

As I was deleting records out of one of my tables, I was watching it via
nodetool cfstats, and I found a very curious thing:

                Memtable cell count: 1285
                Memtable data size, bytes: 0
                Memtable switch count: 56

As the deletion process was chugging away, the memtable cell count
increased, as expected, but the data size stayed at 0. No flushing occurred.

Here¹s the schema for this table:

CREATE TABLE bdn_index_pub (

tshard VARCHAR,

pord INT,

ord INT,

hpath VARCHAR,

page BIGINT,

PRIMARY KEY (tshard, pord)

) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };


I have a few tables that I run this cleaning process on, and not all of them
exhibit this behavior. One of them reported an increasing number of bytes,
as expected, and it also flushed as expected. Here¹s the schema for that
table:


CREATE TABLE bdn_index_child (

ptshard VARCHAR,

ord INT,

hpath VARCHAR,

PRIMARY KEY (ptshard, ord)

) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };


In both cases, I¹m deleting the entire record (i.e. specifying just the
first component of the primary key in the delete statement). Most records in
bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just
a handful of rows, but a few records can have up 10,000.

Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
doesn¹t seem like nearly enough to create a memory problem. Perhaps there
are other flaws in the memory metering. Or perhaps there is some other issue
that causes Cassandra to mismanage the heap when there are a lot of deletes.
One other thought I had is that I page through these tables and clean them
out as I go. Perhaps there is some interaction between the paging and the
deleting that causes the GC problems and I should create a list of keys to
delete and then delete them after I¹ve finished reading the entire table.

I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
GB, in hopes that it would force Cassandra to flush tables before I ran into
death by GC, but it didn¹t seem to help.

I¹m using Cassandra 2.0.4.

Any insights would be greatly appreciated. I can¹t be the only one that has
periodic delete-heavy workloads. Hopefully someone else has run into this
and can give advice.

Thanks

Robert

Re: Lots of deletions results in death by GC

Posted by Benedict Elliott Smith <be...@datastax.com>.

You should find that the patch will apply cleanly to the 2.0.5 release, so
you could apply it yourself.


On 5 February 2014 18:56, Robert Wille <rw...@fold3.com> wrote:

> Thank you so much. Everything I had seen pointed to this being the case.
> I'm glad that someone in the know has confirmed this bug and fixed it. Now
> I just need to figure out where to go from here: do I wait, use the dev
> branch or work around.
>
> Robert
>
> From: Benedict Elliott Smith <be...@datastax.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Wednesday, February 5, 2014 at 8:32 AM
>
> To: <us...@cassandra.apache.org>
> Subject: Re: Lots of deletions results in death by GC
>
> I believe there is a bug, and I have filed a ticket for it:
> https://issues.apache.org/jira/browse/CASSANDRA-6655
>
> I will have a patch uploaded shortly, but it's just missed the 2.0.5
> release window, so you'll either need to grab the development branch once
> it's committed or wait until 2.0.6
>
>
> On 5 February 2014 15:09, Robert Wille <rw...@fold3.com> wrote:
>
>> Yes. It's kind of an unusual workload. An insertion phase followed by a
>> deletion phase, generally not overlapping.
>>
>> From: Benedict Elliott Smith <be...@datastax.com>
>> Reply-To: <us...@cassandra.apache.org>
>> Date: Tuesday, February 4, 2014 at 5:29 PM
>> To: <us...@cassandra.apache.org>
>>
>> Subject: Re: Lots of deletions results in death by GC
>>
>> Is it possible you are generating *exclusively* deletes for this table?
>>
>>
>> On 5 February 2014 00:10, Robert Wille <rw...@fold3.com> wrote:
>>
>>> I ran my test again, and Flush Writer's "All time blocked" increased to
>>> 2 and then shortly thereafter GC went into its death spiral. I doubled
>>> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
>>> tried again.
>>>
>>> This time, the table that always sat with Memtable data size = 0 now
>>> showed increases in Memtable data size. That was encouraging. It never
>>> flushed, which isn't too surprising, because that table has relatively few
>>> rows and they are pretty wide. However, on the fourth table to clean, Flush
>>> Writer's "All time blocked" went to 1, and then there were no more
>>> completed events, and about 10 minutes later GC went into its death spiral.
>>> I assume that each time Flush Writer completes an event, that means a table
>>> was flushed. Is that right? Also, I got two dropped mutation messages at
>>> the same time that Flush Writer's All time blocked incremented.
>>>
>>> I then increased the writers and queue size to 3 and 12, respectively,
>>> and ran my test again. This time All time blocked remained at 0, but I
>>> still suffered death by GC.
>>>
>>> I would almost think that this is caused by high load on the server, but
>>> I've never seen CPU utilization go above about two of my eight available
>>> cores. If high load triggers this problem, then that is very disconcerting.
>>> That means that a CPU spike could permanently cripple a node. Okay, not
>>> permanently, but until a manual flush occurs.
>>>
>>> If anyone has any further thoughts, I'd love to hear them. I'm quite at
>>> the end of my rope.
>>>
>>> Thanks in advance
>>>
>>> Robert
>>>
>>> From: Nate McCall <na...@thelastpickle.com>
>>> Reply-To: <us...@cassandra.apache.org>
>>> Date: Saturday, February 1, 2014 at 9:25 AM
>>> To: Cassandra Users <us...@cassandra.apache.org>
>>> Subject: Re: Lots of deletions results in death by GC
>>>
>>> What's the output of 'nodetool tpstats' while this is happening?
>>> Specifically is Flush Writer "All time blocked" increasing? If so, play
>>> around with turning up memtable_flush_writers and memtable_flush_queue_size
>>> and see if that helps.
>>>
>>>
>>> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:
>>>
>>>> A few days ago I posted about an issue I'm having where GC takes a long
>>>> time (20-30 seconds), and it happens repeatedly and basically no work gets
>>>> done. I've done further investigation, and I now believe that I know the
>>>> cause. If I do a lot of deletes, it creates memory pressure until the
>>>> memtables are flushed, but Cassandra doesn't flush them. If I manually
>>>> flush, then life is good again (although that takes a very long time
>>>> because of the GC issue). If I just leave the flushing to Cassandra, then I
>>>> end up with death by GC. I believe that when the memtables are full of
>>>> tombstones, Cassadnra doesn't realize how much memory the memtables are
>>>> actually taking up, and so it doesn't proactively flush them in order to
>>>> free up heap.
>>>>
>>>> As I was deleting records out of one of my tables, I was watching it
>>>> via nodetool cfstats, and I found a very curious thing:
>>>>
>>>>                 Memtable cell count: 1285
>>>>                 Memtable data size, bytes: 0
>>>>                 Memtable switch count: 56
>>>>
>>>> As the deletion process was chugging away, the memtable cell count
>>>> increased, as expected, but the data size stayed at 0. No flushing
>>>> occurred.
>>>>
>>>> Here's the schema for this table:
>>>>
>>>> CREATE TABLE bdn_index_pub (
>>>>
>>>> tshard VARCHAR,
>>>>
>>>> pord INT,
>>>>
>>>> ord INT,
>>>>
>>>> hpath VARCHAR,
>>>>
>>>> page BIGINT,
>>>>
>>>> PRIMARY KEY (tshard, pord)
>>>>
>>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>>
>>>> I have a few tables that I run this cleaning process on, and not all of
>>>> them exhibit this behavior. One of them reported an increasing number of
>>>> bytes, as expected, and it also flushed as expected. Here's the schema for
>>>> that table:
>>>>
>>>>
>>>> CREATE TABLE bdn_index_child (
>>>>
>>>> ptshard VARCHAR,
>>>>
>>>> ord INT,
>>>>
>>>> hpath VARCHAR,
>>>>
>>>> PRIMARY KEY (ptshard, ord)
>>>>
>>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>>
>>>> In both cases, I'm deleting the entire record (i.e. specifying just the
>>>> first component of the primary key in the delete statement). Most records
>>>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
>>>> just a handful of rows, but a few records can have up 10,000.
>>>>
>>>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>>>> doesn't seem like nearly enough to create a memory problem. Perhaps there
>>>> are other flaws in the memory metering. Or perhaps there is some other
>>>> issue that causes Cassandra to mismanage the heap when there are a lot of
>>>> deletes. One other thought I had is that I page through these tables and
>>>> clean them out as I go. Perhaps there is some interaction between the
>>>> paging and the deleting that causes the GC problems and I should create a
>>>> list of keys to delete and then delete them after I've finished reading the
>>>> entire table.
>>>>
>>>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB)
>>>> to 1 GB, in hopes that it would force Cassandra to flush tables before I
>>>> ran into death by GC, but it didn't seem to help.
>>>>
>>>> I'm using Cassandra 2.0.4.
>>>>
>>>> Any insights would be greatly appreciated. I can't be the only one that
>>>> has periodic delete-heavy workloads. Hopefully someone else has run into
>>>> this and can give advice.
>>>>
>>>> Thanks
>>>>
>>>> Robert
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>>
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>

Re: Lots of deletions results in death by GC

Posted by Robert Wille <rw...@fold3.com>.

Thank you so much. Everything I had seen pointed to this being the case. I¹m
glad that someone in the know has confirmed this bug and fixed it. Now I
just need to figure out where to go from here: do I wait, use the dev branch
or work around.

Robert

From:  Benedict Elliott Smith <be...@datastax.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Wednesday, February 5, 2014 at 8:32 AM
To:  <us...@cassandra.apache.org>
Subject:  Re: Lots of deletions results in death by GC

I believe there is a bug, and I have filed a ticket for it:
https://issues.apache.org/jira/browse/CASSANDRA-6655

I will have a patch uploaded shortly, but it's just missed the 2.0.5 release
window, so you'll either need to grab the development branch once it's
committed or wait until 2.0.6


On 5 February 2014 15:09, Robert Wille <rw...@fold3.com> wrote:
> Yes. It¹s kind of an unusual workload. An insertion phase followed by a
> deletion phase, generally not overlapping.
> 
> From:  Benedict Elliott Smith <be...@datastax.com>
> Reply-To:  <us...@cassandra.apache.org>
> Date:  Tuesday, February 4, 2014 at 5:29 PM
> To:  <us...@cassandra.apache.org>
> 
> Subject:  Re: Lots of deletions results in death by GC
> 
> Is it possible you are generating exclusively deletes for this table?
> 
> 
> On 5 February 2014 00:10, Robert Wille <rw...@fold3.com> wrote:
>> I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and
>> then shortly thereafter GC went into its death spiral. I doubled
>> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried
>> again.
>> 
>> This time, the table that always sat with Memtable data size = 0 now showed
>> increases in Memtable data size. That was encouraging. It never flushed,
>> which isn¹t too surprising, because that table has relatively few rows and
>> they are pretty wide. However, on the fourth table to clean, Flush Writer¹s
>> ³All time blocked² went to 1, and then there were no more completed events,
>> and about 10 minutes later GC went into its death spiral. I assume that each
>> time Flush Writer completes an event, that means a table was flushed. Is that
>> right? Also, I got two dropped mutation messages at the same time that Flush
>> Writer¹s All time blocked incremented.
>> 
>> I then increased the writers and queue size to 3 and 12, respectively, and
>> ran my test again. This time All time blocked remained at 0, but I still
>> suffered death by GC.
>> 
>> I would almost think that this is caused by high load on the server, but I¹ve
>> never seen CPU utilization go above about two of my eight available cores. If
>> high load triggers this problem, then that is very disconcerting. That means
>> that a CPU spike could permanently cripple a node. Okay, not permanently, but
>> until a manual flush occurs.
>> 
>> If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the
>> end of my rope.
>> 
>> Thanks in advance
>> 
>> Robert
>> 
>> From:  Nate McCall <na...@thelastpickle.com>
>> Reply-To:  <us...@cassandra.apache.org>
>> Date:  Saturday, February 1, 2014 at 9:25 AM
>> To:  Cassandra Users <us...@cassandra.apache.org>
>> Subject:  Re: Lots of deletions results in death by GC
>> 
>> What's the output of 'nodetool tpstats' while this is happening? Specifically
>> is Flush Writer "All time blocked" increasing? If so, play around with
>> turning up memtable_flush_writers and memtable_flush_queue_size and see if
>> that helps.
>> 
>> 
>> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:
>>> A few days ago I posted about an issue I¹m having where GC takes a long time
>>> (20-30 seconds), and it happens repeatedly and basically no work gets done.
>>> I¹ve done further investigation, and I now believe that I know the cause. If
>>> I do a lot of deletes, it creates memory pressure until the memtables are
>>> flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
>>> good again (although that takes a very long time because of the GC issue).
>>> If I just leave the flushing to Cassandra, then I end up with death by GC. I
>>> believe that when the memtables are full of tombstones, Cassadnra doesn¹t
>>> realize how much memory the memtables are actually taking up, and so it
>>> doesn¹t proactively flush them in order to free up heap.
>>> 
>>> As I was deleting records out of one of my tables, I was watching it via
>>> nodetool cfstats, and I found a very curious thing:
>>> 
>>>                 Memtable cell count: 1285
>>>                 Memtable data size, bytes: 0
>>>                 Memtable switch count: 56
>>> 
>>> As the deletion process was chugging away, the memtable cell count
>>> increased, as expected, but the data size stayed at 0. No flushing occurred.
>>> 
>>> Here¹s the schema for this table:
>>> 
>>> CREATE TABLE bdn_index_pub (
>>> 
>>> tshard VARCHAR,
>>> 
>>> pord INT,
>>> 
>>> ord INT,
>>> 
>>> hpath VARCHAR,
>>> 
>>> page BIGINT,
>>> 
>>> PRIMARY KEY (tshard, pord)
>>> 
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>> 
>>> 
>>> I have a few tables that I run this cleaning process on, and not all of them
>>> exhibit this behavior. One of them reported an increasing number of bytes,
>>> as expected, and it also flushed as expected. Here¹s the schema for that
>>> table:
>>> 
>>> 
>>> CREATE TABLE bdn_index_child (
>>> 
>>> ptshard VARCHAR,
>>> 
>>> ord INT,
>>> 
>>> hpath VARCHAR,
>>> 
>>> PRIMARY KEY (ptshard, ord)
>>> 
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>> 
>>> 
>>> In both cases, I¹m deleting the entire record (i.e. specifying just the
>>> first component of the primary key in the delete statement). Most records in
>>> bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just
>>> a handful of rows, but a few records can have up 10,000.
>>> 
>>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>>> doesn¹t seem like nearly enough to create a memory problem. Perhaps there
>>> are other flaws in the memory metering. Or perhaps there is some other issue
>>> that causes Cassandra to mismanage the heap when there are a lot of deletes.
>>> One other thought I had is that I page through these tables and clean them
>>> out as I go. Perhaps there is some interaction between the paging and the
>>> deleting that causes the GC problems and I should create a list of keys to
>>> delete and then delete them after I¹ve finished reading the entire table.
>>> 
>>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
>>> GB, in hopes that it would force Cassandra to flush tables before I ran into
>>> death by GC, but it didn¹t seem to help.
>>> 
>>> I¹m using Cassandra 2.0.4.
>>> 
>>> Any insights would be greatly appreciated. I can¹t be the only one that has
>>> periodic delete-heavy workloads. Hopefully someone else has run into this
>>> and can give advice.
>>> 
>>> Thanks
>>> 
>>> Robert
>> 
>> 
>> 
>> -- 
>> -----------------
>> Nate McCall
>> Austin, TX
>> @zznate
>> 
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>

Re: Lots of deletions results in death by GC

Posted by Benedict Elliott Smith <be...@datastax.com>.

I believe there is a bug, and I have filed a ticket for it:
https://issues.apache.org/jira/browse/CASSANDRA-6655

I will have a patch uploaded shortly, but it's just missed the 2.0.5
release window, so you'll either need to grab the development branch once
it's committed or wait until 2.0.6


On 5 February 2014 15:09, Robert Wille <rw...@fold3.com> wrote:

> Yes. It's kind of an unusual workload. An insertion phase followed by a
> deletion phase, generally not overlapping.
>
> From: Benedict Elliott Smith <be...@datastax.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Tuesday, February 4, 2014 at 5:29 PM
> To: <us...@cassandra.apache.org>
>
> Subject: Re: Lots of deletions results in death by GC
>
> Is it possible you are generating *exclusively* deletes for this table?
>
>
> On 5 February 2014 00:10, Robert Wille <rw...@fold3.com> wrote:
>
>> I ran my test again, and Flush Writer's "All time blocked" increased to 2
>> and then shortly thereafter GC went into its death spiral. I doubled
>> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
>> tried again.
>>
>> This time, the table that always sat with Memtable data size = 0 now
>> showed increases in Memtable data size. That was encouraging. It never
>> flushed, which isn't too surprising, because that table has relatively few
>> rows and they are pretty wide. However, on the fourth table to clean, Flush
>> Writer's "All time blocked" went to 1, and then there were no more
>> completed events, and about 10 minutes later GC went into its death spiral.
>> I assume that each time Flush Writer completes an event, that means a table
>> was flushed. Is that right? Also, I got two dropped mutation messages at
>> the same time that Flush Writer's All time blocked incremented.
>>
>> I then increased the writers and queue size to 3 and 12, respectively,
>> and ran my test again. This time All time blocked remained at 0, but I
>> still suffered death by GC.
>>
>> I would almost think that this is caused by high load on the server, but
>> I've never seen CPU utilization go above about two of my eight available
>> cores. If high load triggers this problem, then that is very disconcerting.
>> That means that a CPU spike could permanently cripple a node. Okay, not
>> permanently, but until a manual flush occurs.
>>
>> If anyone has any further thoughts, I'd love to hear them. I'm quite at
>> the end of my rope.
>>
>> Thanks in advance
>>
>> Robert
>>
>> From: Nate McCall <na...@thelastpickle.com>
>> Reply-To: <us...@cassandra.apache.org>
>> Date: Saturday, February 1, 2014 at 9:25 AM
>> To: Cassandra Users <us...@cassandra.apache.org>
>> Subject: Re: Lots of deletions results in death by GC
>>
>> What's the output of 'nodetool tpstats' while this is happening?
>> Specifically is Flush Writer "All time blocked" increasing? If so, play
>> around with turning up memtable_flush_writers and memtable_flush_queue_size
>> and see if that helps.
>>
>>
>> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:
>>
>>> A few days ago I posted about an issue I'm having where GC takes a long
>>> time (20-30 seconds), and it happens repeatedly and basically no work gets
>>> done. I've done further investigation, and I now believe that I know the
>>> cause. If I do a lot of deletes, it creates memory pressure until the
>>> memtables are flushed, but Cassandra doesn't flush them. If I manually
>>> flush, then life is good again (although that takes a very long time
>>> because of the GC issue). If I just leave the flushing to Cassandra, then I
>>> end up with death by GC. I believe that when the memtables are full of
>>> tombstones, Cassadnra doesn't realize how much memory the memtables are
>>> actually taking up, and so it doesn't proactively flush them in order to
>>> free up heap.
>>>
>>> As I was deleting records out of one of my tables, I was watching it via
>>> nodetool cfstats, and I found a very curious thing:
>>>
>>>                 Memtable cell count: 1285
>>>                 Memtable data size, bytes: 0
>>>                 Memtable switch count: 56
>>>
>>> As the deletion process was chugging away, the memtable cell count
>>> increased, as expected, but the data size stayed at 0. No flushing
>>> occurred.
>>>
>>> Here's the schema for this table:
>>>
>>> CREATE TABLE bdn_index_pub (
>>>
>>> tshard VARCHAR,
>>>
>>> pord INT,
>>>
>>> ord INT,
>>>
>>> hpath VARCHAR,
>>>
>>> page BIGINT,
>>>
>>> PRIMARY KEY (tshard, pord)
>>>
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>
>>> I have a few tables that I run this cleaning process on, and not all of
>>> them exhibit this behavior. One of them reported an increasing number of
>>> bytes, as expected, and it also flushed as expected. Here's the schema for
>>> that table:
>>>
>>>
>>> CREATE TABLE bdn_index_child (
>>>
>>> ptshard VARCHAR,
>>>
>>> ord INT,
>>>
>>> hpath VARCHAR,
>>>
>>> PRIMARY KEY (ptshard, ord)
>>>
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>
>>> In both cases, I'm deleting the entire record (i.e. specifying just the
>>> first component of the primary key in the delete statement). Most records
>>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
>>> just a handful of rows, but a few records can have up 10,000.
>>>
>>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>>> doesn't seem like nearly enough to create a memory problem. Perhaps there
>>> are other flaws in the memory metering. Or perhaps there is some other
>>> issue that causes Cassandra to mismanage the heap when there are a lot of
>>> deletes. One other thought I had is that I page through these tables and
>>> clean them out as I go. Perhaps there is some interaction between the
>>> paging and the deleting that causes the GC problems and I should create a
>>> list of keys to delete and then delete them after I've finished reading the
>>> entire table.
>>>
>>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB)
>>> to 1 GB, in hopes that it would force Cassandra to flush tables before I
>>> ran into death by GC, but it didn't seem to help.
>>>
>>> I'm using Cassandra 2.0.4.
>>>
>>> Any insights would be greatly appreciated. I can't be the only one that
>>> has periodic delete-heavy workloads. Hopefully someone else has run into
>>> this and can give advice.
>>>
>>> Thanks
>>>
>>> Robert
>>>
>>
>>
>>
>> --
>> -----------------
>> Nate McCall
>> Austin, TX
>> @zznate
>>
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>

Re: Lots of deletions results in death by GC

Posted by Robert Wille <rw...@fold3.com>.

Yes. It¹s kind of an unusual workload. An insertion phase followed by a
deletion phase, generally not overlapping.

From:  Benedict Elliott Smith <be...@datastax.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Tuesday, February 4, 2014 at 5:29 PM
To:  <us...@cassandra.apache.org>
Subject:  Re: Lots of deletions results in death by GC

Is it possible you are generating exclusively deletes for this table?


On 5 February 2014 00:10, Robert Wille <rw...@fold3.com> wrote:
> I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and
> then shortly thereafter GC went into its death spiral. I doubled
> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried
> again.
> 
> This time, the table that always sat with Memtable data size = 0 now showed
> increases in Memtable data size. That was encouraging. It never flushed, which
> isn¹t too surprising, because that table has relatively few rows and they are
> pretty wide. However, on the fourth table to clean, Flush Writer¹s ³All time
> blocked² went to 1, and then there were no more completed events, and about 10
> minutes later GC went into its death spiral. I assume that each time Flush
> Writer completes an event, that means a table was flushed. Is that right?
> Also, I got two dropped mutation messages at the same time that Flush Writer¹s
> All time blocked incremented.
> 
> I then increased the writers and queue size to 3 and 12, respectively, and ran
> my test again. This time All time blocked remained at 0, but I still suffered
> death by GC.
> 
> I would almost think that this is caused by high load on the server, but I¹ve
> never seen CPU utilization go above about two of my eight available cores. If
> high load triggers this problem, then that is very disconcerting. That means
> that a CPU spike could permanently cripple a node. Okay, not permanently, but
> until a manual flush occurs.
> 
> If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the
> end of my rope.
> 
> Thanks in advance
> 
> Robert
> 
> From:  Nate McCall <na...@thelastpickle.com>
> Reply-To:  <us...@cassandra.apache.org>
> Date:  Saturday, February 1, 2014 at 9:25 AM
> To:  Cassandra Users <us...@cassandra.apache.org>
> Subject:  Re: Lots of deletions results in death by GC
> 
> What's the output of 'nodetool tpstats' while this is happening? Specifically
> is Flush Writer "All time blocked" increasing? If so, play around with turning
> up memtable_flush_writers and memtable_flush_queue_size and see if that helps.
> 
> 
> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:
>> A few days ago I posted about an issue I¹m having where GC takes a long time
>> (20-30 seconds), and it happens repeatedly and basically no work gets done.
>> I¹ve done further investigation, and I now believe that I know the cause. If
>> I do a lot of deletes, it creates memory pressure until the memtables are
>> flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
>> good again (although that takes a very long time because of the GC issue). If
>> I just leave the flushing to Cassandra, then I end up with death by GC. I
>> believe that when the memtables are full of tombstones, Cassadnra doesn¹t
>> realize how much memory the memtables are actually taking up, and so it
>> doesn¹t proactively flush them in order to free up heap.
>> 
>> As I was deleting records out of one of my tables, I was watching it via
>> nodetool cfstats, and I found a very curious thing:
>> 
>>                 Memtable cell count: 1285
>>                 Memtable data size, bytes: 0
>>                 Memtable switch count: 56
>> 
>> As the deletion process was chugging away, the memtable cell count increased,
>> as expected, but the data size stayed at 0. No flushing occurred.
>> 
>> Here¹s the schema for this table:
>> 
>> CREATE TABLE bdn_index_pub (
>> 
>> tshard VARCHAR,
>> 
>> pord INT,
>> 
>> ord INT,
>> 
>> hpath VARCHAR,
>> 
>> page BIGINT,
>> 
>> PRIMARY KEY (tshard, pord)
>> 
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>> 
>> 
>> I have a few tables that I run this cleaning process on, and not all of them
>> exhibit this behavior. One of them reported an increasing number of bytes, as
>> expected, and it also flushed as expected. Here¹s the schema for that table:
>> 
>> 
>> CREATE TABLE bdn_index_child (
>> 
>> ptshard VARCHAR,
>> 
>> ord INT,
>> 
>> hpath VARCHAR,
>> 
>> PRIMARY KEY (ptshard, ord)
>> 
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>> 
>> 
>> In both cases, I¹m deleting the entire record (i.e. specifying just the first
>> component of the primary key in the delete statement). Most records in
>> bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a
>> handful of rows, but a few records can have up 10,000.
>> 
>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>> doesn¹t seem like nearly enough to create a memory problem. Perhaps there are
>> other flaws in the memory metering. Or perhaps there is some other issue that
>> causes Cassandra to mismanage the heap when there are a lot of deletes. One
>> other thought I had is that I page through these tables and clean them out as
>> I go. Perhaps there is some interaction between the paging and the deleting
>> that causes the GC problems and I should create a list of keys to delete and
>> then delete them after I¹ve finished reading the entire table.
>> 
>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
>> GB, in hopes that it would force Cassandra to flush tables before I ran into
>> death by GC, but it didn¹t seem to help.
>> 
>> I¹m using Cassandra 2.0.4.
>> 
>> Any insights would be greatly appreciated. I can¹t be the only one that has
>> periodic delete-heavy workloads. Hopefully someone else has run into this and
>> can give advice.
>> 
>> Thanks
>> 
>> Robert
> 
> 
> 
> -- 
> -----------------
> Nate McCall
> Austin, TX
> @zznate
> 
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com

Re: Lots of deletions results in death by GC

Posted by Benedict Elliott Smith <be...@datastax.com>.

Is it possible you are generating *exclusively* deletes for this table?


On 5 February 2014 00:10, Robert Wille <rw...@fold3.com> wrote:

> I ran my test again, and Flush Writer's "All time blocked" increased to 2
> and then shortly thereafter GC went into its death spiral. I doubled
> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
> tried again.
>
> This time, the table that always sat with Memtable data size = 0 now
> showed increases in Memtable data size. That was encouraging. It never
> flushed, which isn't too surprising, because that table has relatively few
> rows and they are pretty wide. However, on the fourth table to clean, Flush
> Writer's "All time blocked" went to 1, and then there were no more
> completed events, and about 10 minutes later GC went into its death spiral.
> I assume that each time Flush Writer completes an event, that means a table
> was flushed. Is that right? Also, I got two dropped mutation messages at
> the same time that Flush Writer's All time blocked incremented.
>
> I then increased the writers and queue size to 3 and 12, respectively, and
> ran my test again. This time All time blocked remained at 0, but I still
> suffered death by GC.
>
> I would almost think that this is caused by high load on the server, but
> I've never seen CPU utilization go above about two of my eight available
> cores. If high load triggers this problem, then that is very disconcerting.
> That means that a CPU spike could permanently cripple a node. Okay, not
> permanently, but until a manual flush occurs.
>
> If anyone has any further thoughts, I'd love to hear them. I'm quite at
> the end of my rope.
>
> Thanks in advance
>
> Robert
>
> From: Nate McCall <na...@thelastpickle.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Saturday, February 1, 2014 at 9:25 AM
> To: Cassandra Users <us...@cassandra.apache.org>
> Subject: Re: Lots of deletions results in death by GC
>
> What's the output of 'nodetool tpstats' while this is happening?
> Specifically is Flush Writer "All time blocked" increasing? If so, play
> around with turning up memtable_flush_writers and memtable_flush_queue_size
> and see if that helps.
>
>
> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:
>
>> A few days ago I posted about an issue I'm having where GC takes a long
>> time (20-30 seconds), and it happens repeatedly and basically no work gets
>> done. I've done further investigation, and I now believe that I know the
>> cause. If I do a lot of deletes, it creates memory pressure until the
>> memtables are flushed, but Cassandra doesn't flush them. If I manually
>> flush, then life is good again (although that takes a very long time
>> because of the GC issue). If I just leave the flushing to Cassandra, then I
>> end up with death by GC. I believe that when the memtables are full of
>> tombstones, Cassadnra doesn't realize how much memory the memtables are
>> actually taking up, and so it doesn't proactively flush them in order to
>> free up heap.
>>
>> As I was deleting records out of one of my tables, I was watching it via
>> nodetool cfstats, and I found a very curious thing:
>>
>>                 Memtable cell count: 1285
>>                 Memtable data size, bytes: 0
>>                 Memtable switch count: 56
>>
>> As the deletion process was chugging away, the memtable cell count
>> increased, as expected, but the data size stayed at 0. No flushing
>> occurred.
>>
>> Here's the schema for this table:
>>
>> CREATE TABLE bdn_index_pub (
>>
>> tshard VARCHAR,
>>
>> pord INT,
>>
>> ord INT,
>>
>> hpath VARCHAR,
>>
>> page BIGINT,
>>
>> PRIMARY KEY (tshard, pord)
>>
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>
>> I have a few tables that I run this cleaning process on, and not all of
>> them exhibit this behavior. One of them reported an increasing number of
>> bytes, as expected, and it also flushed as expected. Here's the schema for
>> that table:
>>
>>
>> CREATE TABLE bdn_index_child (
>>
>> ptshard VARCHAR,
>>
>> ord INT,
>>
>> hpath VARCHAR,
>>
>> PRIMARY KEY (ptshard, ord)
>>
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>
>> In both cases, I'm deleting the entire record (i.e. specifying just the
>> first component of the primary key in the delete statement). Most records
>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
>> just a handful of rows, but a few records can have up 10,000.
>>
>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>> doesn't seem like nearly enough to create a memory problem. Perhaps there
>> are other flaws in the memory metering. Or perhaps there is some other
>> issue that causes Cassandra to mismanage the heap when there are a lot of
>> deletes. One other thought I had is that I page through these tables and
>> clean them out as I go. Perhaps there is some interaction between the
>> paging and the deleting that causes the GC problems and I should create a
>> list of keys to delete and then delete them after I've finished reading the
>> entire table.
>>
>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB)
>> to 1 GB, in hopes that it would force Cassandra to flush tables before I
>> ran into death by GC, but it didn't seem to help.
>>
>> I'm using Cassandra 2.0.4.
>>
>> Any insights would be greatly appreciated. I can't be the only one that
>> has periodic delete-heavy workloads. Hopefully someone else has run into
>> this and can give advice.
>>
>> Thanks
>>
>> Robert
>>
>
>
>
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Lots of deletions results in death by GC

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Feb 4, 2014 at 4:10 PM, Robert Wille <rw...@fold3.com> wrote:

> I would almost think that this is caused by high load on the server, but
> I've never seen CPU utilization go above about two of my eight available
> cores. If high load triggers this problem, then that is very disconcerting.
> That means that a CPU spike could permanently cripple a node. Okay, not
> permanently, but until a manual flush occurs.
>

While it is unlikely to be related to CPU (i/o much more likely...), the
JVM is certainly capable of finding itself in a low-heap situation which :

1) performs very poorly
2) does not crash
3) and cannot be recovered from

This usually requires a restart of the JVM. This mental model should be an
expectation for any application running in the JVM.

=Rob

Re: Lots of deletions results in death by GC

Posted by Ben Hood <0x...@gmail.com>.

On Wed, Feb 5, 2014 at 2:52 AM, srmore <co...@gmail.com> wrote:
> Dropped messages are the sign that Cassandra is taking heavy that's the load
> shedding mechanism. I would love to see some sort of  back-pressure
> implemented.

+1 for back pressure in general with Cassandra

Re: Lots of deletions results in death by GC

Posted by srmore <co...@gmail.com>.

Sorry to hear that Robert, I ran into similar issue a while ago. I had an
extremely heavy write and update load, as a result Cassandra (1.2.9) was
constantly flushing to disk and used to GC, tried exactly the same steps
you tried (tuning memtable_flush_writers (to 2) and
memtable_flush_queue_size (to 8) )  no luck. Almost all of the issues went
away when I migrated to 1.2.13 this release also had some fixes which I
badly needed.  What version are you running ? (I tried to look in the
thread but couldn't find one, sorry if this is a repeat question)

Dropped messages are the sign that Cassandra is taking heavy that's the
load shedding mechanism. I would love to see some sort of  back-pressure
implemented.

-sandeep


On Tue, Feb 4, 2014 at 6:10 PM, Robert Wille <rw...@fold3.com> wrote:

> I ran my test again, and Flush Writer's "All time blocked" increased to 2
> and then shortly thereafter GC went into its death spiral. I doubled
> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
> tried again.
>
> This time, the table that always sat with Memtable data size = 0 now
> showed increases in Memtable data size. That was encouraging. It never
> flushed, which isn't too surprising, because that table has relatively few
> rows and they are pretty wide. However, on the fourth table to clean, Flush
> Writer's "All time blocked" went to 1, and then there were no more
> completed events, and about 10 minutes later GC went into its death spiral.
> I assume that each time Flush Writer completes an event, that means a table
> was flushed. Is that right? Also, I got two dropped mutation messages at
> the same time that Flush Writer's All time blocked incremented.
>
> I then increased the writers and queue size to 3 and 12, respectively, and
> ran my test again. This time All time blocked remained at 0, but I still
> suffered death by GC.
>
> I would almost think that this is caused by high load on the server, but
> I've never seen CPU utilization go above about two of my eight available
> cores. If high load triggers this problem, then that is very disconcerting.
> That means that a CPU spike could permanently cripple a node. Okay, not
> permanently, but until a manual flush occurs.
>
> If anyone has any further thoughts, I'd love to hear them. I'm quite at
> the end of my rope.
>
> Thanks in advance
>
> Robert
>
> From: Nate McCall <na...@thelastpickle.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Saturday, February 1, 2014 at 9:25 AM
> To: Cassandra Users <us...@cassandra.apache.org>
> Subject: Re: Lots of deletions results in death by GC
>
> What's the output of 'nodetool tpstats' while this is happening?
> Specifically is Flush Writer "All time blocked" increasing? If so, play
> around with turning up memtable_flush_writers and memtable_flush_queue_size
> and see if that helps.
>
>
> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:
>
>> A few days ago I posted about an issue I'm having where GC takes a long
>> time (20-30 seconds), and it happens repeatedly and basically no work gets
>> done. I've done further investigation, and I now believe that I know the
>> cause. If I do a lot of deletes, it creates memory pressure until the
>> memtables are flushed, but Cassandra doesn't flush them. If I manually
>> flush, then life is good again (although that takes a very long time
>> because of the GC issue). If I just leave the flushing to Cassandra, then I
>> end up with death by GC. I believe that when the memtables are full of
>> tombstones, Cassadnra doesn't realize how much memory the memtables are
>> actually taking up, and so it doesn't proactively flush them in order to
>> free up heap.
>>
>> As I was deleting records out of one of my tables, I was watching it via
>> nodetool cfstats, and I found a very curious thing:
>>
>>                 Memtable cell count: 1285
>>                 Memtable data size, bytes: 0
>>                 Memtable switch count: 56
>>
>> As the deletion process was chugging away, the memtable cell count
>> increased, as expected, but the data size stayed at 0. No flushing
>> occurred.
>>
>> Here's the schema for this table:
>>
>> CREATE TABLE bdn_index_pub (
>>
>> tshard VARCHAR,
>>
>> pord INT,
>>
>> ord INT,
>>
>> hpath VARCHAR,
>>
>> page BIGINT,
>>
>> PRIMARY KEY (tshard, pord)
>>
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>
>> I have a few tables that I run this cleaning process on, and not all of
>> them exhibit this behavior. One of them reported an increasing number of
>> bytes, as expected, and it also flushed as expected. Here's the schema for
>> that table:
>>
>>
>> CREATE TABLE bdn_index_child (
>>
>> ptshard VARCHAR,
>>
>> ord INT,
>>
>> hpath VARCHAR,
>>
>> PRIMARY KEY (ptshard, ord)
>>
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>
>> In both cases, I'm deleting the entire record (i.e. specifying just the
>> first component of the primary key in the delete statement). Most records
>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
>> just a handful of rows, but a few records can have up 10,000.
>>
>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>> doesn't seem like nearly enough to create a memory problem. Perhaps there
>> are other flaws in the memory metering. Or perhaps there is some other
>> issue that causes Cassandra to mismanage the heap when there are a lot of
>> deletes. One other thought I had is that I page through these tables and
>> clean them out as I go. Perhaps there is some interaction between the
>> paging and the deleting that causes the GC problems and I should create a
>> list of keys to delete and then delete them after I've finished reading the
>> entire table.
>>
>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB)
>> to 1 GB, in hopes that it would force Cassandra to flush tables before I
>> ran into death by GC, but it didn't seem to help.
>>
>> I'm using Cassandra 2.0.4.
>>
>> Any insights would be greatly appreciated. I can't be the only one that
>> has periodic delete-heavy workloads. Hopefully someone else has run into
>> this and can give advice.
>>
>> Thanks
>>
>> Robert
>>
>
>
>
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Lots of deletions results in death by GC

Posted by Robert Wille <rw...@fold3.com>.

I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2
and then shortly thereafter GC went into its death spiral. I doubled
memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried
again.

This time, the table that always sat with Memtable data size = 0 now showed
increases in Memtable data size. That was encouraging. It never flushed,
which isn¹t too surprising, because that table has relatively few rows and
they are pretty wide. However, on the fourth table to clean, Flush Writer¹s
³All time blocked² went to 1, and then there were no more completed events,
and about 10 minutes later GC went into its death spiral. I assume that each
time Flush Writer completes an event, that means a table was flushed. Is
that right? Also, I got two dropped mutation messages at the same time that
Flush Writer¹s All time blocked incremented.

I then increased the writers and queue size to 3 and 12, respectively, and
ran my test again. This time All time blocked remained at 0, but I still
suffered death by GC.

I would almost think that this is caused by high load on the server, but
I¹ve never seen CPU utilization go above about two of my eight available
cores. If high load triggers this problem, then that is very disconcerting.
That means that a CPU spike could permanently cripple a node. Okay, not
permanently, but until a manual flush occurs.

If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the
end of my rope.

Thanks in advance

Robert

From:  Nate McCall <na...@thelastpickle.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Saturday, February 1, 2014 at 9:25 AM
To:  Cassandra Users <us...@cassandra.apache.org>
Subject:  Re: Lots of deletions results in death by GC

What's the output of 'nodetool tpstats' while this is happening?
Specifically is Flush Writer "All time blocked" increasing? If so, play
around with turning up memtable_flush_writers and memtable_flush_queue_size
and see if that helps.

On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:
> A few days ago I posted about an issue I¹m having where GC takes a long time
> (20-30 seconds), and it happens repeatedly and basically no work gets done.
> I¹ve done further investigation, and I now believe that I know the cause. If I
> do a lot of deletes, it creates memory pressure until the memtables are
> flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
> good again (although that takes a very long time because of the GC issue). If
> I just leave the flushing to Cassandra, then I end up with death by GC. I
> believe that when the memtables are full of tombstones, Cassadnra doesn¹t
> realize how much memory the memtables are actually taking up, and so it
> doesn¹t proactively flush them in order to free up heap.
> 
> As I was deleting records out of one of my tables, I was watching it via
> nodetool cfstats, and I found a very curious thing:
> 
>                 Memtable cell count: 1285
>                 Memtable data size, bytes: 0
>                 Memtable switch count: 56
> 
> As the deletion process was chugging away, the memtable cell count increased,
> as expected, but the data size stayed at 0. No flushing occurred.
> 
> Here¹s the schema for this table:
> 
> CREATE TABLE bdn_index_pub (
> 
> tshard VARCHAR,
> 
> pord INT,
> 
> ord INT,
> 
> hpath VARCHAR,
> 
> page BIGINT,
> 
> PRIMARY KEY (tshard, pord)
> 
> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
> 
> 
> I have a few tables that I run this cleaning process on, and not all of them
> exhibit this behavior. One of them reported an increasing number of bytes, as
> expected, and it also flushed as expected. Here¹s the schema for that table:
> 
> 
> CREATE TABLE bdn_index_child (
> 
> ptshard VARCHAR,
> 
> ord INT,
> 
> hpath VARCHAR,
> 
> PRIMARY KEY (ptshard, ord)
> 
> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
> 
> 
> In both cases, I¹m deleting the entire record (i.e. specifying just the first
> component of the primary key in the delete statement). Most records in
> bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a
> handful of rows, but a few records can have up 10,000.
> 
> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable doesn¹t
> seem like nearly enough to create a memory problem. Perhaps there are other
> flaws in the memory metering. Or perhaps there is some other issue that causes
> Cassandra to mismanage the heap when there are a lot of deletes. One other
> thought I had is that I page through these tables and clean them out as I go.
> Perhaps there is some interaction between the paging and the deleting that
> causes the GC problems and I should create a list of keys to delete and then
> delete them after I¹ve finished reading the entire table.
> 
> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
> GB, in hopes that it would force Cassandra to flush tables before I ran into
> death by GC, but it didn¹t seem to help.
> 
> I¹m using Cassandra 2.0.4.
> 
> Any insights would be greatly appreciated. I can¹t be the only one that has
> periodic delete-heavy workloads. Hopefully someone else has run into this and
> can give advice.
> 
> Thanks
> 
> Robert

-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Lots of deletions results in death by GC

Posted by Nate McCall <na...@thelastpickle.com>.

What's the output of 'nodetool tpstats' while this is happening?
Specifically is Flush Writer "All time blocked" increasing? If so, play
around with turning up memtable_flush_writers and memtable_flush_queue_size
and see if that helps.


On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rw...@fold3.com> wrote:

> A few days ago I posted about an issue I'm having where GC takes a long
> time (20-30 seconds), and it happens repeatedly and basically no work gets
> done. I've done further investigation, and I now believe that I know the
> cause. If I do a lot of deletes, it creates memory pressure until the
> memtables are flushed, but Cassandra doesn't flush them. If I manually
> flush, then life is good again (although that takes a very long time
> because of the GC issue). If I just leave the flushing to Cassandra, then I
> end up with death by GC. I believe that when the memtables are full of
> tombstones, Cassadnra doesn't realize how much memory the memtables are
> actually taking up, and so it doesn't proactively flush them in order to
> free up heap.
>
> As I was deleting records out of one of my tables, I was watching it via
> nodetool cfstats, and I found a very curious thing:
>
>                 Memtable cell count: 1285
>                 Memtable data size, bytes: 0
>                 Memtable switch count: 56
>
> As the deletion process was chugging away, the memtable cell count
> increased, as expected, but the data size stayed at 0. No flushing
> occurred.
>
> Here's the schema for this table:
>
> CREATE TABLE bdn_index_pub (
>
> tshard VARCHAR,
>
> pord INT,
>
> ord INT,
>
> hpath VARCHAR,
>
> page BIGINT,
>
> PRIMARY KEY (tshard, pord)
>
> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>
> I have a few tables that I run this cleaning process on, and not all of
> them exhibit this behavior. One of them reported an increasing number of
> bytes, as expected, and it also flushed as expected. Here's the schema for
> that table:
>
>
> CREATE TABLE bdn_index_child (
>
> ptshard VARCHAR,
>
> ord INT,
>
> hpath VARCHAR,
>
> PRIMARY KEY (ptshard, ord)
>
> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>
> In both cases, I'm deleting the entire record (i.e. specifying just the
> first component of the primary key in the delete statement). Most records
> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
> just a handful of rows, but a few records can have up 10,000.
>
> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
> doesn't seem like nearly enough to create a memory problem. Perhaps there
> are other flaws in the memory metering. Or perhaps there is some other
> issue that causes Cassandra to mismanage the heap when there are a lot of
> deletes. One other thought I had is that I page through these tables and
> clean them out as I go. Perhaps there is some interaction between the
> paging and the deleting that causes the GC problems and I should create a
> list of keys to delete and then delete them after I've finished reading the
> entire table.
>
> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to
> 1 GB, in hopes that it would force Cassandra to flush tables before I ran
> into death by GC, but it didn't seem to help.
>
> I'm using Cassandra 2.0.4.
>
> Any insights would be greatly appreciated. I can't be the only one that
> has periodic delete-heavy workloads. Hopefully someone else has run into
> this and can give advice.
>
> Thanks
>
> Robert
>



-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com