You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Aaron Morton <aa...@thelastpickle.com> on 2010/09/14 04:35:40 UTC

get_slice and deletes

I'm running the 0.7 nightly build from aug 31 and noticed some different performance characteristics when using get_slice against a row that has seen a lot of deletes.

One row in the key space has around 650K columns, colums are small at around 53 bytes each so a total of around 30MB. In the last hour or so I finished deleting around 300K columns from the row (and another approx 1M rows from other CF's) that were ordered before those those left in there. 

I stopped my processing restarted it and noticed that a get_slice was running significantly slower then before. If I do a get_slice for 101 columns, no finish col name and vary the start column I see different performance.

start="" - 5 to 6 secs
start = "excer" - 5 to 6 secs
start = "excerise-2010-08-31t17-15-57-92421646-11330" - 0.5 to 0.6 secs (this is the first col in this row)

For comparison a get_slice against another row with 232K cols in the same keyspace, different CF but same col size, with an empty start returned in 0.01 secs.

Could a high level of deletes on a row reduce the get_slice performance ? Is it worth forcing the tombstones out by reducing the GCGraceSeconds and doing a compaction to see what happens ?

Thanks
Aaron




UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 79: ordinal not in range(128)

Posted by Claire Chang <cl...@merchantcircle.com>.
hi,

I am using pycass with cassandra 0.6.3 and encountered this error while inserting a column. The offending characters are in a column value. Could anyone shed some light on this issue? 

UserAgent: UnkownUA
Referer: 
Traceback (most recent call last):
  File "../application/model/status2.py", line 257, in import_from_service
    json=json)
  File "../application/model/status2.py", line 57, in __init__
    lib.batch_insert(CLIENT, cf_maps, write_consistency_level=ConsistencyLevel.QUORUM)
  File "../application/lib/batchcolumnfamily.py", line 113, in batch_insert
    write_consistency_level)
  File "/usr/lib/python2.4/site-packages/pycassa/connection.py", line 185, in client_call
    return getattr(self._local.client, attr)(*args, **kwargs)
  File "/usr/lib/python2.4/site-packages/cassandra/Cassandra.py", line 780, in batch_mutate
    self.send_batch_mutate(keyspace, mutation_map, consistency_level)
  File "/usr/lib/python2.4/site-packages/cassandra/Cassandra.py", line 789, in send_batch_mutate
    args.write(self._oprot)
  File "/usr/lib/python2.4/site-packages/cassandra/Cassandra.py", line 3686, in write
    oprot.trans.write(fastbinary.encode_binary(self, (self.__class__, self.thrift_spec)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 79: ordinal not in range(128)

Re: get_slice and deletes

Posted by Aaron Morton <aa...@thelastpickle.com>.
Thanks, I thought as much. 

Will give it a clean out and see how it goes. 

Aaron


On 14 Sep, 2010,at 03:40 PM, Jonathan Ellis <jb...@gmail.com> wrote:

On Mon, Sep 13, 2010 at 9:35 PM, Aaron Morton <aa...@thelastpickle.com> wrote:
> Could a high level of deletes on a row reduce the get_slice performance ?

absolutely.

> Is
> it worth forcing the tombstones out by reducing the GCGraceSeconds and doing
> a compaction to see what happens ?

You can try it to verify, but that's what the problem is. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: get_slice and deletes

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Sep 13, 2010 at 9:35 PM, Aaron Morton <aa...@thelastpickle.com> wrote:
> Could a high level of deletes on a row reduce the get_slice performance ?

absolutely.

> Is
> it worth forcing the tombstones out by reducing the GCGraceSeconds and doing
> a compaction to see what happens ?

You can try it to verify, but that's what the problem is. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com