You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by feedly team <fe...@gmail.com> on 2012/08/20 18:49:13 UTC

get_slice on wide rows

I have a column family that I am using for consistency purposes. Basically
a marker column is written to a row in this family before some actions take
place and is deleted only after all the actions complete. The idea is that
if something goes horribly wrong this table can be read to see what needs
to be fixed.

In my dev environment things worked as planned, but in a larger scale/high
traffic environment, the slice query times out and then cassandra quickly
runs out of memory. The main difference here is that there is a very large
number of writes (and deleted columns) in the row my code is attempting to
read. Is the problem that cassandra is attempting to load all the deleted
columns into memory? I did an sstableToJson dump and saw that the "d"
deletion marker seemed to be present for the columns, though i didn't write
any code to check all values. Is the solution here partitioning the wide
row into multiple narrower rows?

Re: get_slice on wide rows

Posted by aaron morton <aa...@thelastpickle.com>.
> Is the problem that cassandra is attempting to load all the deleted columns into memory? 
Yup. 

The talk by Mat Dennis at the Cassandra Summit may be of interest to you. He talks about similar things http://www.datastax.com/events/cassandrasummit2012/presentations

Drop the gc_grace_seconds to 1 so that tombstones can be purged faster. A column level tombstone still has to hit disk so that it will overwrite any existing column on disk. (so setting gc_grace_seconds to 0 has pretty much the same effect). 

You may also want to try levelled DB on the CF http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

> Is the solution here partitioning the wide row into multiple narrower rows?
That's also sensible. I would give the approach above a try first, may give you more bang for your buck. 
 
Cheers
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/08/2012, at 4:49 AM, feedly team <fe...@gmail.com> wrote:

> I have a column family that I am using for consistency purposes. Basically a marker column is written to a row in this family before some actions take place and is deleted only after all the actions complete. The idea is that if something goes horribly wrong this table can be read to see what needs to be fixed. 
> 
> In my dev environment things worked as planned, but in a larger scale/high traffic environment, the slice query times out and then cassandra quickly runs out of memory. The main difference here is that there is a very large number of writes (and deleted columns) in the row my code is attempting to read. Is the problem that cassandra is attempting to load all the deleted columns into memory? I did an sstableToJson dump and saw that the "d" deletion marker seemed to be present for the columns, though i didn't write any code to check all values. Is the solution here partitioning the wide row into multiple narrower rows?