You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Pavel Kirienko <pa...@gmail.com> on 2013/07/05 19:46:29 UTC
select count(1) and OOM
Hi everyone,
I was playing with a single-node Cassandra installation when discovered
that a request like [SELECT COUNT(*) FROM CF] seems to load the entire
dataset of CF into RAM. I am not sure is it expected to behave this way or
not. I'd expect it to iterate through the entire set of rows rather than
collect values in memory.
My steps:
create table big_table (
k int primary key,
idx bigint,
val ascii,
ts timestamp
);
create index on big_table (idx);
I filled the table above with 400 random rows, where column 'val' was
written with random strings of 10MB each. Thus I came up roughly with 4GB
of data.
At this point everything is fine, response delays are pretty good and
memory consumption is adequate.
Things go bad with a counting request like [SELECT COUNT(1) FROM big_table]
- that makes the database die with OOM. However, it is possible to fetch
any column except the huge one: [SELECT k FROM big_table] - this works okay.
As far as I understand, a counting request works roughly the same way as
[SELECT * FROM] with only difference that it doesn't return any data back.
Is my reasoning correct?
Thanks in advance,
Pavel.
Re: select count(1) and OOM
Posted by Sylvain Lebresne <sy...@datastax.com>.
> I was playing with a single-node Cassandra installation when discovered
> that a request like [SELECT COUNT(*) FROM CF] seems to load the entire
> dataset of CF into RAM.
>
This is the case (the whole CF will be loaded in memory). And it's
currently a know limitation of Cassandra 1.2. This will be fix in Cassandra
2.0 but require some ground work (made in
https://issues.apache.org/jira/browse/CASSANDRA-4415) that is too complex
to backport in 1.2. So avoid those count queries for now unless you know
the data set is small.
> As far as I understand, a counting request works roughly the same way as
> [SELECT * FROM] with only difference that it doesn't return any data back.
> Is my reasoning correct?
>
That part is pretty much correct. If you do SELECT * FROM CF (without any
WHERE clause that is), it will also load the whole CF in memory and I would
bet that this OOM as well (if the count(*) OOM).
--
Sylvain