You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "James A. Robinson" <ji...@gmail.com> on 2020/10/23 16:26:33 UTC

sstable processing times

Hi folks,

I'm running a job on an offline node to test how long it takes to run
sstablesplit several large sstable.

I'm a bit dismayed to see it took about 22 hours to process a 1.5
gigabyte sstable!  I worry about the 32 gigabyte sstable that is my
ultimate target to split.

This is running on an otherwise unloaded Linux 3.10.0 CentOS 7 server
with 4 cpus and 24 gigabytes of ram.  Cassandra 3.11.0 and OpenJDK
1.8.0_252 are the installed versions of the software.

The machine isn't very busy itself, it looks as though java is only
making use of 1 of the 4 processors, and it's not using much of the
available 24 gigabytes of memory either, all the memory usage is in
the linux buffer cache, which I guess makes sense if it's just working
on these large files w/o needing to do a lot of heavy computation on
what it reads from them.

When you folks run sstablesplit, do you provide specific
CASSANDRA_INCLUDE settings to increase its performance?

Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: sstable processing times

Posted by Erick Ramirez <er...@datastax.com>.
The operation will run in a single anti-compaction thread so it won't
consume more than 1 CPU. The operation will mostly be IO-bound with the
disk being the most bottleneck. Are running it on a direct-attached SSD? It
won't perform well if you're running it on an EBS volume or some other slow
disk. Cheers!