You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Tom van den Berge <to...@gmail.com> on 2015/09/15 16:44:43 UTC

Secondary index is causing high CPU load

Read queries on a secondary index are somehow causing an excessively high
CPU load on all nodes in my DC.

The table has some 60K records, and the cardinality of the index is very
low (~10 distinct values). The returned result set typically contains
10-30K records.
The same queries on nodes in another DC are working fine. The nodes with
the high CPU are in a newly set up DC (see my previous message below). The
hardware in both DCs is the same, as well as the C* version (2.1.6). The
only difference in the C* setup is that the new DC is using vnodes (256),
while the old DC is not. Both DCs have 4 nodes, and RF=2.

I've rebuilt the index, but that didn't help.

It looks a bit like CASSANDRA-8530
<https://issues.apache.org/jira/browse/CASSANDRA-8530> (unresolved).

What really surprised me is that executing a single query on this secondary
index makes the "Local read count" in the cfstats for the index go up with
almost 200000! When doing the same query on one of my "good" nodes, it only
increases with a small number, as I would expect.

Could it be that the use of vnodes is causing these problems?

Regards,
Tom

On Mon, Sep 14, 2015 at 8:09 PM, Tom van den Berge <
tom.vandenberge@gmail.com> wrote:

> I have a DC of 4 nodes that must be expanded to accommodate an expected
> growth in data. Since the DC is not using vnodes, we have decided to set up
> a new DC with vnodes enabled, start using the new DC, and decommission the
> old DC.
>
> Both DCs have 4 nodes. The idea is to add additional nodes to the new DC
> later on.
> The servers in both DCs are very similar: quad-core machines with 8GB.
>
> We have bootstrapped/rebuilt the nodes in the new DC. When that finished,
> the nodes in the new DC were showing little CPU activity, as you would
> expect, because they are receiving writes from the other DC. So far, so
> good.
>
> Then we switched the clients from the old DC to the new DC. The CPU load
> on all nodes in the new DC immediately rose to excessively high levels (15
> - 25), which made the servers effectively unavailable. The load did not
> drop structurally within 20 minutes, so we had to switch the clients back
> to the old DC. Then the load dropped again.
>
> What can be the reason for the high CPU loads on the new nodes?
>
> Performance test shows that the servers in the new DC perform slightly
> better (both IO and CPU) than the servers in the old DC.
> I did not see anything abnormal in the Cassandra logs, like garbage
> collection warnings. I also did not see any strange things in the tpstats.
> The only difference I'm aware of between the old and new DC is the use of
> vnodes.
>
> Any help is appreciated!
> Thanks,
> Tom
>

Re: Secondary index is causing high CPU load

Posted by Tyler Hobbs <ty...@datastax.com>.

See https://issues.apache.org/jira/browse/CASSANDRA-10414 for an overview
of why vnodes are currently less efficient for secondary index queries.

On Tue, Sep 29, 2015 at 12:45 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Sep 15, 2015 at 7:44 AM, Tom van den Berge <
> tom.vandenberge@gmail.com> wrote:
>
>> Read queries on a secondary index are somehow causing an excessively high
>> CPU load on all nodes in my DC.
>>
> ...
>
>> What really surprised me is that executing a single query on this
>> secondary index makes the "Local read count" in the cfstats for the index
>> go up with almost 200000! When doing the same query on one of my "good"
>> nodes, it only increases with a small number, as I would expect.
>>
>> Could it be that the use of vnodes is causing these problems?
>>
>
> I am not too surprised to hear of this performance degradation.
>
> Yes, it is relatively likely to be the use of vnodes which is causing this
> problem. You could verify by having one of your nodes use 64 vnodes instead
> of the default 256... you will get less even distribution with current
> vnode random allocation, but you will pay less of a penalty for having
> multiple ranges...
>
> =Rob
>
>
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Secondary index is causing high CPU load

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Sep 15, 2015 at 7:44 AM, Tom van den Berge <
tom.vandenberge@gmail.com> wrote:

> Read queries on a secondary index are somehow causing an excessively high
> CPU load on all nodes in my DC.
>
...

> What really surprised me is that executing a single query on this
> secondary index makes the "Local read count" in the cfstats for the index
> go up with almost 200000! When doing the same query on one of my "good"
> nodes, it only increases with a small number, as I would expect.
>
> Could it be that the use of vnodes is causing these problems?
>

I am not too surprised to hear of this performance degradation.

Yes, it is relatively likely to be the use of vnodes which is causing this
problem. You could verify by having one of your nodes use 64 vnodes instead
of the default 256... you will get less even distribution with current
vnode random allocation, but you will pay less of a penalty for having
multiple ranges...

=Rob