You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Pierre Devops <pi...@gmail.com> on 2015/05/07 10:44:53 UTC

Slow bulk loading

Hi,

I m streaming a big sstable using bulk loader of sstableloader but it's
very slow (3 Mbytes/sec) :

Summary statistics:
   Connections per host:         : 1
   Total files transferred:      : 1
   Total bytes transferred:      : 10357947484
   Total duration (ms):          : 3280229
   Average transfer rate (MB/s): : 3
   Peak transfer rate (MB/s):    : 3

I'm on a single node configuration, empty keyspace and table, with good
hardware 8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of
ressource for the process. I'm uploading from another server.

The sstable is 9GB in size and have 4 partitions, but a lot of rows per
partition (like 100 millions), the clustering key is a INT and have 4 other
regulars columns, so approximatly 500 millions cells per ColumnFamily.

When I upload I notice one core of the cassandra node is full CPU (all
other cores are idleing), so I assume I'm CPU bound on node side. But why ?
What the node is doing ? Why does it take so long time ?

Re: Slow bulk loading

Posted by Mike Neir <mi...@liquidweb.com>.
It sounds as though you could be having troubles with Garbage Collection. Check 
your cassandra system logs and search for "GC". If you see frequent garbage 
collections taking more than a second or two to complete, you're going to need 
to do some configuration tweaking.

On 05/07/2015 04:44 AM, Pierre Devops wrote:
> Hi,
>
> I m streaming a big sstable using bulk loader of sstableloader but it's very
> slow (3 Mbytes/sec) :
>
> Summary statistics:
>     Connections per host:         : 1
>     Total files transferred:      : 1
>     Total bytes transferred:      : 10357947484
>     Total duration (ms):          : 3280229
>     Average transfer rate (MB/s): : 3
>     Peak transfer rate (MB/s):    : 3
>
> I'm on a single node configuration, empty keyspace and table, with good hardware
> 8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of ressource for the
> process. I'm uploading from another server.
>
> The sstable is 9GB in size and have 4 partitions, but a lot of rows per
> partition (like 100 millions), the clustering key is a INT and have 4 other
> regulars columns, so approximatly 500 millions cells per ColumnFamily.
>
> When I upload I notice one core of the cassandra node is full CPU (all other
> cores are idleing), so I assume I'm CPU bound on node side. But why ? What the
> node is doing ? Why does it take so long time ?
>

-- 



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator


Re: Slow bulk loading

Posted by Nate McCall <na...@thelastpickle.com>.
>
>
>
> When I upload I notice one core of the cassandra node is full CPU (all
> other cores are idleing),
>

Take a look at the interrupt distribution (cat /proc/interrupts). You'll
probably see disk and network interrupts mostly/all bound to CPU0. If that
is the case, this article has an excellent description of the underlying
issue as well as some work-arounds:
http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux



-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com