You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Tom van den Berge <to...@drillster.com> on 2016/03/04 14:56:38 UTC

Unexplainably large reported partition sizes

Hi,

I'm seeing warnings in my logs about compacting large partitions, e.g.:

 Compacting large partition
drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)

This means that this single partition is about 1.4GB large. This is much
larger that it can possibly be, because of two reasons:
  1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
  2) the entire table consumes appr. 500MB of disk space on the node
containing the partition (including snapshots)

Furthermore, nodetool cfstats tells me this:
Space used (live): 253,928,111
Space used (total): 253,928,111
Compacted partition maximum bytes: 2,395,318,855
The space used seem to match the actual size (excl. snapshots), but the
Compacted partition maximum bytes (2,3 GB) seems to be far higher than
possible. Does anyone know how it is possible that Cassandra reports such
unlikely sizes?

>From time to time, I'm noticing relatively bad latencies when such
partitions are (fully) queried. So I'm not fully convinced that the actual
partition size is not in the order of 1 or 2 GB. Does anyone have an
explanation for these discrepancies?

Thanks,
Tom

Re: Unexplainably large reported partition sizes

Posted by Tom van den Berge <to...@drillster.com>.
No, data is hardly ever deleted from this table. The cfstats conform this.
The data is also nog reinserted.
Op 5 mrt. 2016 6:20 PM schreef "DuyHai Doan" <do...@gmail.com>:

> Maybe tombstones ? Do you issue a lot of DELETE statements ? Or do you
> re-insert in the same partition with different TTL values ?
>
> On Sat, Mar 5, 2016 at 7:16 PM, Tom van den Berge <to...@drillster.com>
> wrote:
>
>> I don't think compression can be the cause of the difference, because of
>> two reasons:
>>
>> 1) The partition size I calculated myself (3 MB) is the uncompressed
>> size, and so is the reported size (2.3 GB)
>>
>> 2) The difference is simply way too big to be explained by compression,
>> even if the calculated size would have been the compressed size. The
>> compression would be 0.125% of the original, which is not realistic. In the
>> logs, I can see that the typical compression that is achieved for this
>> table is around 80% of the original.
>>
>> Tom
>>
>> On Fri, Mar 4, 2016 at 9:48 PM, Robert Coli <rc...@eventbrite.com> wrote:
>>
>>> On Fri, Mar 4, 2016 at 5:56 AM, Tom van den Berge <to...@drillster.com>
>>> wrote:
>>>
>>>>  Compacting large partition
>>>> drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)
>>>>
>>>> This means that this single partition is about 1.4GB large. This is
>>>> much larger that it can possibly be, because of two reasons:
>>>>   1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
>>>>   2) the entire table consumes appr. 500MB of disk space on the node
>>>> containing the partition (including snapshots)
>>>>
>>>> Furthermore, nodetool cfstats tells me this:
>>>> Space used (live): 253,928,111
>>>> Space used (total): 253,928,111
>>>> Compacted partition maximum bytes: 2,395,318,855
>>>> The space used seem to match the actual size (excl. snapshots), but the
>>>> Compacted partition maximum bytes (2,3 GB) seems to be far higher than
>>>> possible. Does anyone know how it is possible that Cassandra reports such
>>>> unlikely sizes?
>>>>
>>>
>>> Compression is enabled by default, and compaction reports the
>>> uncompressed size.
>>>
>>> =Rob
>>>
>>>
>>
>>
>>
>> --
>> Tom van den Berge
>> Lead Software Engineer
>>
>> [image: Drillster]
>>
>> Middenburcht 136
>> 3452 MT Vleuten
>> Netherlands +31 30 755 53 30
>> www.drillster.com
>>
>> [image: Follow us on Facebook] Follow us on Facebook
>> <https://www.facebook.com/Drillster>
>>
>
>

Re: Unexplainably large reported partition sizes

Posted by DuyHai Doan <do...@gmail.com>.
Maybe tombstones ? Do you issue a lot of DELETE statements ? Or do you
re-insert in the same partition with different TTL values ?

On Sat, Mar 5, 2016 at 7:16 PM, Tom van den Berge <to...@drillster.com> wrote:

> I don't think compression can be the cause of the difference, because of
> two reasons:
>
> 1) The partition size I calculated myself (3 MB) is the uncompressed size,
> and so is the reported size (2.3 GB)
>
> 2) The difference is simply way too big to be explained by compression,
> even if the calculated size would have been the compressed size. The
> compression would be 0.125% of the original, which is not realistic. In the
> logs, I can see that the typical compression that is achieved for this
> table is around 80% of the original.
>
> Tom
>
> On Fri, Mar 4, 2016 at 9:48 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Fri, Mar 4, 2016 at 5:56 AM, Tom van den Berge <to...@drillster.com>
>> wrote:
>>
>>>  Compacting large partition
>>> drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)
>>>
>>> This means that this single partition is about 1.4GB large. This is much
>>> larger that it can possibly be, because of two reasons:
>>>   1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
>>>   2) the entire table consumes appr. 500MB of disk space on the node
>>> containing the partition (including snapshots)
>>>
>>> Furthermore, nodetool cfstats tells me this:
>>> Space used (live): 253,928,111
>>> Space used (total): 253,928,111
>>> Compacted partition maximum bytes: 2,395,318,855
>>> The space used seem to match the actual size (excl. snapshots), but the
>>> Compacted partition maximum bytes (2,3 GB) seems to be far higher than
>>> possible. Does anyone know how it is possible that Cassandra reports such
>>> unlikely sizes?
>>>
>>
>> Compression is enabled by default, and compaction reports the
>> uncompressed size.
>>
>> =Rob
>>
>>
>
>
>
> --
> Tom van den Berge
> Lead Software Engineer
>
> [image: Drillster]
>
> Middenburcht 136
> 3452 MT Vleuten
> Netherlands +31 30 755 53 30
> www.drillster.com
>
> [image: Follow us on Facebook] Follow us on Facebook
> <https://www.facebook.com/Drillster>
>

Re: Unexplainably large reported partition sizes

Posted by Tom van den Berge <to...@drillster.com>.
Hi Bryan,


> Do you use any collections on this column family? We've had issues in the
> past with unexpectedly large partitions reported on data models with
> collections, which can also generate tons of tombstones on UPDATE (
> https://issues.apache.org/jira/browse/CASSANDRA-10547)
>

 I've been bitten by this one some time ago, too. I stopped using
collections because of this. The table in question doesn't use them either.

Thanks for the suggestion anyway!
Tom

Re: Unexplainably large reported partition sizes

Posted by Bryan Cheng <br...@blockcypher.com>.
Hi Tom,

Do you use any collections on this column family? We've had issues in the
past with unexpectedly large partitions reported on data models with
collections, which can also generate tons of tombstones on UPDATE (
https://issues.apache.org/jira/browse/CASSANDRA-10547)

--Bryan


On Mon, Mar 7, 2016 at 11:23 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Sat, Mar 5, 2016 at 9:16 AM, Tom van den Berge <to...@drillster.com>
> wrote:
>
>> I don't think compression can be the cause of the difference, because of
>> two reasons:
>>
>
> Your two reasons seem legitimate.
>
> Though you say you do not frequently do DELETE and so it shouldn't be due
> to tombstones, there are semi-recent versions of Cassandra which create a
> runaway avalanche of tombstones that double every time they are compacted.
> What version are you running?
>
> Also, is there some reason you are not just dumping the table with
> sstable2json and inspecting the contents of the row in question?
>
> =Rob
>
>
>
>

Re: Unexplainably large reported partition sizes

Posted by Tom van den Berge <to...@drillster.com>.
Thanks guys. I've upgraded to 2.2.5, and the problem is gone.


Tom

On Wed, Mar 9, 2016 at 10:47 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Mon, Mar 7, 2016 at 1:25 PM, Nate McCall <na...@thelastpickle.com>
> wrote:
>
>>
>>> Rob, can you remember which bug/jira this was? I have not been able to
>>> find it.
>>> I'm using 2.1.9.
>>>
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-7953
>>
>> Rob may have a different one, but I've something similar from this issue.
>> Fixed in 2.1.12.
>>
>
> Nate is correct, I was referring to CASSANDRA-7953... :)
>
> =Rob
>
>



-- 
Tom van den Berge
Lead Software Engineer

[image: Drillster]

Middenburcht 136
3452 MT Vleuten
Netherlands +31 30 755 53 30
www.drillster.com

[image: Follow us on Facebook] Follow us on Facebook
<https://www.facebook.com/Drillster>

Re: Unexplainably large reported partition sizes

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Mar 7, 2016 at 1:25 PM, Nate McCall <na...@thelastpickle.com> wrote:

>
>> Rob, can you remember which bug/jira this was? I have not been able to
>> find it.
>> I'm using 2.1.9.
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-7953
>
> Rob may have a different one, but I've something similar from this issue.
> Fixed in 2.1.12.
>

Nate is correct, I was referring to CASSANDRA-7953... :)

=Rob

Re: Unexplainably large reported partition sizes

Posted by Nate McCall <na...@thelastpickle.com>.
>
>
> Rob, can you remember which bug/jira this was? I have not been able to
> find it.
> I'm using 2.1.9.
>
>
https://issues.apache.org/jira/browse/CASSANDRA-7953

Rob may have a different one, but I've something similar from this issue.
Fixed in 2.1.12.


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Unexplainably large reported partition sizes

Posted by Tom van den Berge <to...@drillster.com>.
Hi Rob,

The reason I didn't dump the table with sstable2json is that I didn't think
of it ;) I just used it, and it looks very much like the "avalanche of
tombstones" bug you are describing!

I took one of the three sstables containing the key, and it resulted in a
4.75 million-line json file, of which 4.73 million lines contain a
tombstone ("t") !
The timestamps of the tombstones I've checked were all many months old, so
obviously compaction failed to clean them up. I can also see many, many
identical tombstoned rows.

Rob, can you remember which bug/jira this was? I have not been able to find
it.
I'm using 2.1.9.

Thanks a lot for pointing me in this direction!
Tom

Re: Unexplainably large reported partition sizes

Posted by Robert Coli <rc...@eventbrite.com>.
On Sat, Mar 5, 2016 at 9:16 AM, Tom van den Berge <to...@drillster.com> wrote:

> I don't think compression can be the cause of the difference, because of
> two reasons:
>

Your two reasons seem legitimate.

Though you say you do not frequently do DELETE and so it shouldn't be due
to tombstones, there are semi-recent versions of Cassandra which create a
runaway avalanche of tombstones that double every time they are compacted.
What version are you running?

Also, is there some reason you are not just dumping the table with
sstable2json and inspecting the contents of the row in question?

=Rob

Re: Unexplainably large reported partition sizes

Posted by Tom van den Berge <to...@drillster.com>.
I don't think compression can be the cause of the difference, because of
two reasons:

1) The partition size I calculated myself (3 MB) is the uncompressed size,
and so is the reported size (2.3 GB)

2) The difference is simply way too big to be explained by compression,
even if the calculated size would have been the compressed size. The
compression would be 0.125% of the original, which is not realistic. In the
logs, I can see that the typical compression that is achieved for this
table is around 80% of the original.

Tom

On Fri, Mar 4, 2016 at 9:48 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Mar 4, 2016 at 5:56 AM, Tom van den Berge <to...@drillster.com>
> wrote:
>
>>  Compacting large partition
>> drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)
>>
>> This means that this single partition is about 1.4GB large. This is much
>> larger that it can possibly be, because of two reasons:
>>   1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
>>   2) the entire table consumes appr. 500MB of disk space on the node
>> containing the partition (including snapshots)
>>
>> Furthermore, nodetool cfstats tells me this:
>> Space used (live): 253,928,111
>> Space used (total): 253,928,111
>> Compacted partition maximum bytes: 2,395,318,855
>> The space used seem to match the actual size (excl. snapshots), but the
>> Compacted partition maximum bytes (2,3 GB) seems to be far higher than
>> possible. Does anyone know how it is possible that Cassandra reports such
>> unlikely sizes?
>>
>
> Compression is enabled by default, and compaction reports the uncompressed
> size.
>
> =Rob
>
>



-- 
Tom van den Berge
Lead Software Engineer

[image: Drillster]

Middenburcht 136
3452 MT Vleuten
Netherlands +31 30 755 53 30
www.drillster.com

[image: Follow us on Facebook] Follow us on Facebook
<https://www.facebook.com/Drillster>

Re: Unexplainably large reported partition sizes

Posted by Robert Coli <rc...@eventbrite.com>.
On Fri, Mar 4, 2016 at 5:56 AM, Tom van den Berge <to...@drillster.com> wrote:

>  Compacting large partition
> drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)
>
> This means that this single partition is about 1.4GB large. This is much
> larger that it can possibly be, because of two reasons:
>   1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
>   2) the entire table consumes appr. 500MB of disk space on the node
> containing the partition (including snapshots)
>
> Furthermore, nodetool cfstats tells me this:
> Space used (live): 253,928,111
> Space used (total): 253,928,111
> Compacted partition maximum bytes: 2,395,318,855
> The space used seem to match the actual size (excl. snapshots), but the
> Compacted partition maximum bytes (2,3 GB) seems to be far higher than
> possible. Does anyone know how it is possible that Cassandra reports such
> unlikely sizes?
>

Compression is enabled by default, and compaction reports the uncompressed
size.

=Rob