You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Thorsten von Eicken <tv...@rightscale.com> on 2011/11/30 08:36:56 UTC

trouble with deleted counter columns

Running a single 1.0.3 node and using counter columns I have a problem.
I have rows with ~200k counters. I deleted a number of such rows and now
I can't put counters back in, or really, I can't query what I put back in.

Example using the cli:
[default@rslog_production] get req_word_freq['20111124'];
Returned 0 results.
Elapsed time: 2089 msec(s).
[default@rslog_production] incr req_word_freq['20111124']['test'];
Value incremented.
[default@rslog_production] get req_word_freq['20111124'];
Returned 0 results.
Elapsed time: 2018 msec(s).

Note how long it's taking, presumably because it's going through 200K+
tombstones?

Here's the same using a fresh row key, note the timings:
[default@rslog_production] get req_word_freq['test'];
Returned 0 results.
Elapsed time: 1 msec(s).
[default@rslog_production] incr req_word_freq['test']['test'];
Value incremented.
[default@rslog_production] get req_word_freq['test'];
=> (counter=test, value=1)
Returned 1 results.
Elapsed time: 6 msec(s).

Incidentally, I then tried out deleting the column and I don't
understand why the value is 2 at the end:
[default@rslog_production] del req_word_freq['test'];
row removed.
[default@rslog_production] get req_word_freq['test'];
Returned 0 results.
Elapsed time: 1 msec(s).
[default@rslog_production] incr req_word_freq['test']['test'];
Value incremented.
[default@rslog_production] get req_word_freq['test'];
=> (counter=test, value=2)
Returned 1 results.
Elapsed time: 1 msec(s).

All this is on a single node system, running the cassandra-cli on the
system itself. The CF is as follows:
[default@rslog_production] describe req_word_freq;
    ColumnFamily: req_word_freq
      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Default column value validator:
org.apache.cassandra.db.marshal.CounterColumnType
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period in seconds / keys to save : 0.0/0/all
      Row Cache Provider:
org.apache.cassandra.cache.SerializingCacheProvider
      Key cache size / save period in seconds: 200000.0/14400
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: true
      Built indexes: []
      Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

I must be missing something...
Thorsten


Re: trouble with deleted counter columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Wed, Nov 30, 2011 at 8:36 AM, Thorsten von Eicken <tv...@rightscale.com> wrote:
> Running a single 1.0.3 node and using counter columns I have a problem.
> I have rows with ~200k counters. I deleted a number of such rows and now
> I can't put counters back in, or really, I can't query what I put back in.

The reason is explained at
http://wiki.apache.org/cassandra/Counters#Technical_limitations,
though it wasn't clear that it was taking your situation into account
(I've just updated it though). To rephrase, counters removal is only
supported if definitive. You cannot increment after a deletion. Or
rather, if you do, the behavior is undetermined. This holds for row
deletion too; if you delete a row, you can't increment any counter
that was there previously (the truth being that if you wait enough it
would work, but how many is enough depends on things like when
compaction happens and what is your gc_grace value).

Note that I understand this could be a problem for your use case
but that is an unfortunate limitation of the current design.

> Example using the cli:
> [default@rslog_production] get req_word_freq['20111124'];
> Returned 0 results.
> Elapsed time: 2089 msec(s).
> [default@rslog_production] incr req_word_freq['20111124']['test'];
> Value incremented.
> [default@rslog_production] get req_word_freq['20111124'];
> Returned 0 results.
> Elapsed time: 2018 msec(s).
>
> Note how long it's taking, presumably because it's going through 200K+
> tombstones?

That is likely the reason, yes.

>
> Here's the same using a fresh row key, note the timings:
> [default@rslog_production] get req_word_freq['test'];
> Returned 0 results.
> Elapsed time: 1 msec(s).
> [default@rslog_production] incr req_word_freq['test']['test'];
> Value incremented.
> [default@rslog_production] get req_word_freq['test'];
> => (counter=test, value=1)
> Returned 1 results.
> Elapsed time: 6 msec(s).
>
> Incidentally, I then tried out deleting the column and I don't
> understand why the value is 2 at the end:
> [default@rslog_production] del req_word_freq['test'];
> row removed.
> [default@rslog_production] get req_word_freq['test'];
> Returned 0 results.
> Elapsed time: 1 msec(s).
> [default@rslog_production] incr req_word_freq['test']['test'];
> Value incremented.
> [default@rslog_production] get req_word_freq['test'];
> => (counter=test, value=2)
> Returned 1 results.
> Elapsed time: 1 msec(s).
>
> All this is on a single node system, running the cassandra-cli on the
> system itself. The CF is as follows:
> [default@rslog_production] describe req_word_freq;
>    ColumnFamily: req_word_freq
>      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>      Default column value validator:
> org.apache.cassandra.db.marshal.CounterColumnType
>      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>      Row cache size / save period in seconds / keys to save : 0.0/0/all
>      Row Cache Provider:
> org.apache.cassandra.cache.SerializingCacheProvider
>      Key cache size / save period in seconds: 200000.0/14400
>      GC grace seconds: 864000
>      Compaction min/max thresholds: 4/32
>      Read repair chance: 1.0
>      Replicate on write: true
>      Built indexes: []
>      Compaction Strategy:
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>
> I must be missing something...
> Thorsten
>