You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sébastien Druon <sd...@spotuse.com> on 2011/01/28 09:42:55 UTC

Cassandra and count

Hello,

I have a question concerning count in cassandra, as I would like to count
the rows of a CF:
- is it mandatory to specify a range?
- what is the cost of a count operation on a CF?

Thanks in advance for the answers

Sebastien

Re: Cassandra and count

Posted by aaron morton <aa...@thelastpickle.com>.
There are two functions on the 0.7 API http://wiki.apache.org/cassandra/API to count the columns in a row, get_count() and multiget_count() (not listed on the wiki yet). Both of these will take a SlicePredicate which may have an empty start and end. 

The only way to count rows is to use  get_range_slice(), which will return the columns request. To reduce bandwidth of the query request it to return a single column.

However the return from these functions is not guaranteed to be correct. Cassandra does not lock it's internal structures, so while it's busy processing your request other connections may be adding columns and rows. So that by the time it returns back to you the count if already wrong. You can apply the same reasoning to why there are no aggregate functions. 

Do you need count the rows as a once off or is it part of your application design ? 

Hope that helps
Aaron

On 29 Jan 2011, at 05:02, Victor Kabdebon wrote:

> Buddasystem is right.
> A count returns columns to the client which count it. My advice : do not count big columns / supercolumns. People in the dev team are trying to develop distributed counters but I don't know the state of this research.
> 
> Best regards,
> Victor Kabdebon
> http://www.voxnucleus.fr
> 
> 2011/1/28 buddhasystem <po...@bnl.gov>
> 
> As far as I know, there are no aggregate operations built into Cassandra,
> which means you'll have to retrieve all of the data to count it in the
> client. I had a thread on this topic 2 weeks ago. It's pretty bad.
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
> 


Re: Cassandra and count

Posted by Victor Kabdebon <vi...@gmail.com>.
Buddasystem is right.
A count returns columns to the client which count it. My advice : do not
count big columns / supercolumns. People in the dev team are trying to
develop distributed counters but I don't know the state of this research.

Best regards,
Victor Kabdebon
http://www.voxnucleus.fr

2011/1/28 buddhasystem <po...@bnl.gov>

>
> As far as I know, there are no aggregate operations built into Cassandra,
> which means you'll have to retrieve all of the data to count it in the
> client. I had a thread on this topic 2 weeks ago. It's pretty bad.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: Cassandra and count

Posted by buddhasystem <po...@bnl.gov>.
As far as I know, there are no aggregate operations built into Cassandra,
which means you'll have to retrieve all of the data to count it in the
client. I had a thread on this topic 2 weeks ago. It's pretty bad.

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.