You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by buddhasystem <po...@bnl.gov> on 2011/02/02 16:35:31 UTC

Counters in 0.8 -- conditional?

I'm looking at
http://wiki.apache.org/cassandra/Counters

So, the counter feature -- it doesn't seem to count rows based in criteria,
such as index condition. Is that correct?

Case in point, I keep a large inventory of computational tasks over a long
period of time. I'm supposed to report on fairly random periods of time,
what were the failure counts and other status condition on exit. So noting
except for an aggregate function will work (unless I do a scan which is
clearly unmanageable).

TIA.

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5985214.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Counters in 0.8 -- conditional?

Posted by Sylvain Lebresne <sy...@datastax.com>.
>
> Thanks. Yes I know it's by no means trivial. I thought in case there was an
> index on the column on which I want to place condition, the index machinery
> itself can do the counting (i.e. when the index is updated, the counter is
> incremented). It doesn't seem too orthogonal to the current implementation,
> at least from my very limited experience.
>

It's actually not that easy in Cassandra since a main feature is that the
writes
don't imply a read. But if you want your index to count the number of row
where
foo=bar, you need when a new column foo=bar is written to know if the column
did not exist already in this row, in which cas you should not increment.
Similarly,
if you write foo=42, you need to check if the previous value was foo=bar and
if
yes decrement your index.

--
Sylvain

Re: Counters in 0.8 -- conditional?

Posted by buddhasystem <po...@bnl.gov>.
Thanks. Yes I know it's by no means trivial. I thought in case there was an
index on the column on which I want to place condition, the index machinery
itself can do the counting (i.e. when the index is updated, the counter is
incremented). It doesn't seem too orthogonal to the current implementation,
at least from my very limited experience.

Maxim

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986871.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Counters in 0.8 -- conditional?

Posted by Peter Schuller <pe...@infidyne.com>.
> Thanks. Just wanted to note that counting the number of rows where foo=bar is
> a fairly ubiquitous task in db applications. In case of "big data",
> trafficking all these data to client just to count something isn't optimal
> at all.

You can ask Cassandra to do the counting, but the cost is still going
to involve reading the data on the Cassandra end. Hence, O(n) rather
than O(1). (It would obviously be nice if counts could be done O(1),
but it's not trivial to implement or obvious how to do it in order for
it to be generally useful. Even non-distributed databases like
PostgreSQL have issues with that.)

-- 
/ Peter Schuller

Re: Counters in 0.8 -- conditional?

Posted by buddhasystem <po...@bnl.gov>.
Thanks. Just wanted to note that counting the number of rows where foo=bar is
a fairly ubiquitous task in db applications. In case of "big data",
trafficking all these data to client just to count something isn't optimal
at all.

Maxim

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986442.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Counters in 0.8 -- conditional?

Posted by Peter Schuller <pe...@infidyne.com>.
> I'm looking at
> http://wiki.apache.org/cassandra/Counters
>
> So, the counter feature -- it doesn't seem to count rows based in criteria,
> such as index condition. Is that correct?

Yes, it's just about supporting counters in and of themselves (which
is non-trivial in a distributed system). It is unrelated to counting
rows or columns, unless the application happens to use them for that.

-- 
/ Peter Schuller