You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Edouard COLE <Ed...@rgsystem.com> on 2015/09/02 18:11:21 UTC

Feedback on Secondary Indexes

Hello,

I don't know if this is a good place to talk about that, but I think this might help some people running into the same issue, so I will simply give some feedback here about what I was running in the last few months

I've been using secondary index (yes, this is bad), but the side effects I've been fighting at were absolutely not documented, nor discussed, this is why I want to share my experience with it

Background:
- The cluster was 8 nodes (around 30GB per node) with 4000 read/s and 4000 write/s
- The biggest table (event_data) is 200GB cluster wide, and take something like 1000 write/s and a few reads/s
- This table had a secondary index (1,2GB cluster wide) (yes, this is huge)

CREATE TABLE event_data (
  object int,
  created_at timeuuid,
  message text,
  source text,
... few other things...
  PRIMARY KEY ((object), created_at)
) WITH
... few other things...
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX event_data_source_index ON event_data (source);

We had many issues with the cluster, and some of them was hard to correlate:
- Unable to add a node, it was joining forever, with a 100% CPU and no WARN/ERROR logs
- When running a repair, all my thrift clients were timing out randomly (wtf)

I decided to turn the joining node to DEBUG logging, and the logs were talking only about the index being sync, but at a very slow pace (showing every indexed keys). I concluded it was because the index was getting synchronized slower than it was actually changing on the other nodes, resulting into something infinitely long

I decided to drop the secondary index, and everything is now running fine!

We also changed the compaction strategy from STCS to DTCS, and it rocks!

I hope this message will help someone someday,

Edouard COLE

tags: secondary index join stuck timeout thrift feedback random repair