You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by mcasandra <mo...@gmail.com> on 2011/04/12 02:56:09 UTC

Lot of pending tasks for writes

I am running stress test and on one of the nodes I see:

[root@dsdb5 ~]# nodetool -h `hostname` tpstats
Pool Name                    Active   Pending      Completed
ReadStage                         0         0           2495
RequestResponseStage              0         0         242202
MutationStage                    48       521         287850
ReadRepairStage                   0         0            799
GossipStage                       0         0          10639
AntiEntropyStage                  0         0              0
MigrationStage                    0         0            202
MemtablePostFlusher               1         2           1047
StreamStage                       0         0              0
FlushWriter                       1         1           1047
FILEUTILS-DELETE-POOL             0         0           2048
MiscStage                         0         0              0
FlushSorter                       0         0              0
InternalResponseStage             0         0              0
HintedHandoff                     1         3              5

and cfstats

Keyspace: StressKeyspace
        Read Count: 2494
        Read Latency: 4987.431669206095 ms.
        Write Count: 281705
        Write Latency: 0.017631469090005503 ms.
        Pending Tasks: 49
                Column Family: StressStandard
                SSTable count: 882
                Space used (live): 139589196497
                Space used (total): 139589196497
                Memtable Columns Count: 6
                Memtable Data Size: 14204955
                Memtable Switch Count: 1932
                Read Count: 2494
                Read Latency: 5921.633 ms.
                Write Count: 282522
                Write Latency: 0.017 ms.
                Pending Tasks: 32
                Key cache capacity: 1000000
                Key cache size: 1198
                Key cache hit rate: 0.0013596193065941536
                Row cache: disabled
                Compacted row minimum size: 219343
                Compacted row maximum size: 5839588
                Compacted row mean size: 557125

I am just running simple test in 6 node cassandra 4 GB heap, 96 GB RAM and
12 core per host. I am inserting 1M rows with avg col size of 250k. I keep
getting "Dropped mutation" messages in logs. Not sure how to troubleshoot or
tune it.

Can someone please help?

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Lot-of-pending-tasks-for-writes-tp6263462p6263462.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Lot of pending tasks for writes

Posted by Peter Schuller <pe...@infidyne.com>.

> It does appear that I am IO bound. Disks show about 90% util.

Well, also pay attention to the average queue size column. If there
are constantly more requests waiting to be serviced than you have
platters, you're almost certainly I/O bound. The utilization number
can be a bit flaky sometimes, although 90% doesn't a bit too far below
100% to be attributed to inexactness in the kernel's measurements.

> What are my options then? Is cassandra not suitable for columns of this
> size?

It depends. Cassandra is a log-structured database, meaning that all
writes are sequential and you are going to be doing background
compactions that imply re-reading and re-writing data.

This optimization makes sense in particular for smaller values where
the cost of doing sequential I/O is a lot less than seek-bound I/O,
but it is less relevant for large values.

The main "cost" of background compactions is the extra reading and
writing of data that happens. If your workload is full of huge values,
then the only significant cost *is* the sequential I/O. So in that
sense, background compaction becomes more expensive relative to the
theoretical optimum than it does for small values.

It depends on details of the access pattern, but I'd say that (1) for
very large values, Cassandra's advantages become less pronounced in
terms of local storage on each nodes, although the clustering
capabilities remain relevant, and that (2) depending on the details of
the use-case, Cassandra *may* not be terribly suitable.

> I am running stress code from hector which doesn't sound like give ability
> to do operations per sec. I am insert 1M rows and then reading. Have not
> been able to do in parallel because of io issues.

stress.py doesn't support any throttling, except very very indirectly
by limiting the total number of threads.

In a situation like this I think you need to look at what your target
traffic is going to be like. Throwing un-throttled traffic at the
cluster like stress.py does is not indicative of normal traffic
patterns. For typical use-cases with small columns this is still
handled well, but when you are both unthrottled *and* are throwing
huge columns at it, there is no expectation that this is handled very
well.

So, for large values like this I recommend figuring out what the
actual expected sustained amount of writes is, and then benchmark
that. Using stress.py out-of-the-box is not giving you much relevant
information, other than the known fact that throwing huge-column
traffic at Cassandra without throttling is not handled very
gracefully.

But that said, when using un-throttled benchmarking like stress.py -
at any time where you're throwing more traffic at the cluster than it
can handle, is it *fully expected* that you will see the 'active'
stages be saturated and a build-up of 'pending' operations. This is
the expected results of submitting a greater number of requests per
second than can be processed - in pretty much any system. You queue up
to some degree, and eventually you start having to drop or fail
requests.

The unique thing about large columns is that it becomes a lot easier
to saturate a node with a single (or few) stress.py clients than it is
when stressing with a more normal type of load. The extra cost of
dealing with large values is higher in Cassandra than it is in
stress.py; so suddenly a single stress.py can easily saturate lots of
nodes simply because you can so trivially be writing data at very high
throughput by upping the column sizes

-- 
/ Peter Schuller

Re: Lot of pending tasks for writes

Posted by mcasandra <mo...@gmail.com>.

It does appear that I am IO bound. Disks show about 90% util.

What are my options then? Is cassandra not suitable for columns of this
size?

I am running stress code from hector which doesn't sound like give ability
to do operations per sec. I am insert 1M rows and then reading. Have not
been able to do in parallel because of io issues.

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Lot-of-pending-tasks-for-writes-tp6263462p6266706.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Lot of pending tasks for writes

Posted by Peter Schuller <pe...@infidyne.com>.

> I am just running simple test in 6 node cassandra 4 GB heap, 96 GB RAM and
> 12 core per host. I am inserting 1M rows with avg col size of 250k. I keep
> getting "Dropped mutation" messages in logs. Not sure how to troubleshoot or
> tune it.

Average col size of 250k - that sounds to me like you're almost
certainly going to be bottlenecking on disk I/O.

Saturating your "active" in the mutation stage and building up pending
is consistent with simply writing faster than writes can be handled.
At first I was skeptical and figured maybe something was wrong, but
upon re-reading and spotting your 250k column size - it's really easy
to have a stress client saturate nodes with data sizes that large.

The first thing I would do is to just look at what's going on on the
system. For example, just run "iostat -x -k 1" on the machines and see
whether you're completely disk bound or not. I suspect you are, and
that the effects you're seeing is simply the result of that.

However that would depend on how many mutations per second you're
actually sending. But if you're using out-of-the-box stress.py without
rate limiting and using a column size of 250k, I am not at all
surprised that you're easily able to saturate your nodes.

-- 
/ Peter Schuller

Re: Lot of pending tasks for writes

Posted by mcasandra <mo...@gmail.com>.

Can someone please help?

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Lot-of-pending-tasks-for-writes-tp6263462p6266213.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.