You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by mcasandra <mo...@gmail.com> on 2011/04/12 20:36:14 UTC

flush_largest_memtables_at messages in 7.4

I am using cassandra 7.4 and getting these messages.

Heap is 0.7802529021498031 full. You may need to reduce memtable and/or
cache sizes Cassandra will now flush up to the two largest memtables to free
up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if
you don't want Cassandra to do this automatically

How do I verify that I need to adjust any thresholds? And how to calculate
correct value?

When I got this message only reads were occuring.

create keyspace StressKeyspace
    with replication_factor = 3
    and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';

use StressKeyspace;
drop column family StressStandard;
create column family StressStandard
    with comparator = UTF8Type
    and keys_cached = 1000000
    and memtable_flush_after = 1440
    and memtable_throughput = 128;

 nodetool -h dsdb4 tpstats
Pool Name                    Active   Pending      Completed
ReadStage                        32       281         456598
RequestResponseStage              0         0         797237
MutationStage                     0         0         499205
ReadRepairStage                   0         0         149077
GossipStage                       0         0         217227
AntiEntropyStage                  0         0              0
MigrationStage                    0         0            201
MemtablePostFlusher               0         0           1842
StreamStage                       0         0              0
FlushWriter                       0         0           1841
FILEUTILS-DELETE-POOL             0         0           3670
MiscStage                         0         0              0
FlushSorter                       0         0              0
InternalResponseStage             0         0              0
HintedHandoff                     0         0             15

cfstats

Keyspace: StressKeyspace
        Read Count: 460988
        Read Latency: 38.07654727454945 ms.
        Write Count: 499205
        Write Latency: 0.007409593253272703 ms.
        Pending Tasks: 0
                Column Family: StressStandard
                SSTable count: 9
                Space used (live): 247408645485
                Space used (total): 247408645485
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 1878
                Read Count: 460989
                Read Latency: 28.237 ms.
                Write Count: 499205
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 1000000
                Key cache size: 299862
                Key cache hit rate: 0.6031833150384193
                Row cache: disabled
                Compacted row minimum size: 219343
                Compacted row maximum size: 5839588
                Compacted row mean size: 497474


--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266221.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
64 bit 12 core 96 GB RAM

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266400.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by Jonathan Colby <jo...@gmail.com>.
your jvm heap has reached 78% so cassandra automatically flushes its memtables.   you need to explain more about your configuration.   32 or 64 bit OS, what is max heap, how much ram installed?

If this happens under stress test conditions its probably understandable.  you should look into graphing your memory usage, or use the jconsole to graph heap during your tests.

On Apr 12, 2011, at 8:36 PM, mcasandra wrote:

> I am using cassandra 7.4 and getting these messages.
> 
> Heap is 0.7802529021498031 full. You may need to reduce memtable and/or
> cache sizes Cassandra will now flush up to the two largest memtables to free
> up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if
> you don't want Cassandra to do this automatically
> 
> How do I verify that I need to adjust any thresholds? And how to calculate
> correct value?
> 
> When I got this message only reads were occuring.
> 
> create keyspace StressKeyspace
>    with replication_factor = 3
>    and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
> 
> use StressKeyspace;
> drop column family StressStandard;
> create column family StressStandard
>    with comparator = UTF8Type
>    and keys_cached = 1000000
>    and memtable_flush_after = 1440
>    and memtable_throughput = 128;
> 
> nodetool -h dsdb4 tpstats
> Pool Name                    Active   Pending      Completed
> ReadStage                        32       281         456598
> RequestResponseStage              0         0         797237
> MutationStage                     0         0         499205
> ReadRepairStage                   0         0         149077
> GossipStage                       0         0         217227
> AntiEntropyStage                  0         0              0
> MigrationStage                    0         0            201
> MemtablePostFlusher               0         0           1842
> StreamStage                       0         0              0
> FlushWriter                       0         0           1841
> FILEUTILS-DELETE-POOL             0         0           3670
> MiscStage                         0         0              0
> FlushSorter                       0         0              0
> InternalResponseStage             0         0              0
> HintedHandoff                     0         0             15
> 
> cfstats
> 
> Keyspace: StressKeyspace
>        Read Count: 460988
>        Read Latency: 38.07654727454945 ms.
>        Write Count: 499205
>        Write Latency: 0.007409593253272703 ms.
>        Pending Tasks: 0
>                Column Family: StressStandard
>                SSTable count: 9
>                Space used (live): 247408645485
>                Space used (total): 247408645485
>                Memtable Columns Count: 0
>                Memtable Data Size: 0
>                Memtable Switch Count: 1878
>                Read Count: 460989
>                Read Latency: 28.237 ms.
>                Write Count: 499205
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache capacity: 1000000
>                Key cache size: 299862
>                Key cache hit rate: 0.6031833150384193
>                Row cache: disabled
>                Compacted row minimum size: 219343
>                Compacted row maximum size: 5839588
>                Compacted row mean size: 497474
> 
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266221.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.


Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
And btw I've been assuming your reads are running without those 250k
column inserts going at the same time. It would be difficult to see
what's going on if you have both of those traffic patterns at the same
time.


-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
And btw you can also directly read out the average wait time column
and average latency column of iostat too, to confirm that individual
i/o requests are taking longer than on a non-saturated drive.

-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
> Actually when I run 2 stress clients in parallel I see Read Latency stay the
> same. I wonder if cassandra is reporting accurate nos.

Or you're just bottlenecking on something else. Are you running the
extra stress client on different machines for example, so that the
client isn't just saturating?

> I understand your analogy but for some reason I don't see that happening
> with the results I am seeing with multiple stress clients running. So I am
> just confused where the real bottleneck is.

If your queue size to your device is consistently high (you were
mentioning numbers in the ~100 range), you're saturating on disk,
periods. Unless your disk is a 500 drive RAID volume and 100 requests
represents 1/5 of capacity... (If you have a raid volume with a few
disks or an ssd, you want to switch to the noop or deadline scheduler
btw.)

-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
Actually when I run 2 stress clients in parallel I see Read Latency stay the
same. I wonder if cassandra is reporting accurate nos.

I understand your analogy but for some reason I don't see that happening
with the results I am seeing with multiple stress clients running. So I am
just confused where the real bottleneck is.



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6270942.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
> I still don't understand. You would expect read latency to increase
> drastically when it's fully saturated and lot of READ drop messages also,
> correct? I don't see that in cfstats or system.log which I don't really
> understand why.

No. With a fixed concurrency there is only so many outstanding
requests at any given moment. Unless the concurrency is sufficiently
huge to exhaust queue sizes that's not what you see. It's just the way
it works.

An analogous situation is this: You have N threads all doing random
reads from your disk. If each thread is responsible for exactly one
outstanding I/O operation at a time, you will never have more than N
outstanding requests to the I/O device in total. On average then, when
a request starts it will have N - 1 requests in "front" of it waiting
to be processes, so the latency will increase linearly with N. Double
the concurrency, double the latency.

Something like Cassandra/postgresql/etc behaves similarly. If you're
running a benchmark at a fixed concurrency of N, you won't have more
than N total requests outstanding at any given moment.

-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
I still don't understand. You would expect read latency to increase
drastically when it's fully saturated and lot of READ drop messages also,
correct? I don't see that in cfstats or system.log which I don't really
understand why.

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6270244.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
> But read latency is still something like 30ms which I would think would be
> much higher if it's saturated.

No. You're using stress, so you have some total cap on concurrency.
Given a fixed concurrency, you'll saturate at some particular average
latency which is mostly a function of the backlog implies by the
concurrency and the average time for each actual request to process.

If you double the concurrency of your stress clients, you should
expect roughly twice the average latency.

Running a fixed-concurrency benchmark against a saturated cluster will
have vastly different effects on latency than trying to serve real
live traffic without a feedback mechanism with a system which is
processing fewer requests than incoming.

-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
One correction qu size in iostat ranges between 6-120. But still this doesn't
explain why read latency is low in cfstats.

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6269875.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
Peter Schuller wrote:
> 
> Saturated.
> 
But read latency is still something like 30ms which I would think would be
much higher if it's saturated.


--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6269655.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
> Does it really matter how long cassandra has been running? I thought it will
> keep keys of 1M at least.

It will keep up to the limit, and it will save caches periodically and
reload them on start. But the cache needs to be populated by traffic
first. If you wrote a bunch of data, enabled the row cache, and began
reading you have to first wait for population of the cache prior to
looking at cache locality.

Note that the saving of caches is periodic and if you were constantly
restarting nodes during testing maybe it never got saved with the full
set of keys.

> Regarding your previous question about queue size in iostat I see it ranging
> from 114-300.

Saturated.

-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
Does it really matter how long cassandra has been running? I thought it will
keep keys of 1M at least.

Regarding your previous question about queue size in iostat I see it ranging
from 114-300.



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6267728.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
> One thing I am noticing is that cache hit rate is very low even though my
> cache key size is 1M and I have less than 1M rows. Not sure why so many
> cache miss?

The key cache should be strictly LRU for read-only workloads. For
write/read workloads it may not be strictly LRU because compaction
causes key cache migration.

In your case:

>                Key cache capacity: 1000000
>                Key cache size: 256013
>                Key cache hit rate: 0.33801452784503633

So you have only 256k in the cache. Have you run for long enough after
enabling it for it to actually be fully populated?

-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
One thing I am noticing is that cache hit rate is very low even though my
cache key size is 1M and I have less than 1M rows. Not sure why so many
cache miss?

Keyspace: StressKeyspace
        Read Count: 162506
        Read Latency: 45.22479006928975 ms.
        Write Count: 247180
        Write Latency: 0.011610943442026053 ms.
        Pending Tasks: 0
                Column Family: StressStandard
                SSTable count: 184
                Space used (live): 99616537894
                Space used (total): 99616537894
                Memtable Columns Count: 351
                Memtable Data Size: 171716049
                Memtable Switch Count: 543
                Read Count: 162507
                Read Latency: 317.892 ms.
                Write Count: 247180
                Write Latency: 0.006 ms.
                Pending Tasks: 0
                Key cache capacity: 1000000
                Key cache size: 256013
                Key cache hit rate: 0.33801452784503633
                Row cache: disabled
                Compacted row minimum size: 182786
                Compacted row maximum size: 5839588
                Compacted row mean size: 537470



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6267234.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
> Yes

Without checking I don't know the details of the memtable threshold
calculations enough to be sure whether large columns are somehow
causing the size estimations to be ineffective (off hand I would
expect the reverse since the overhead of the Java object structures
become much less significant); but if this is not the case, then this
particular problem should be a matter of adjusting heap size according
to your memtable thresholds. I.e., increase heap size and/or decrease
memtable flush thresholds.

-- 
/ Peter Schuller

Re: flush_largest_memtables_at messages in 7.4

Posted by mcasandra <mo...@gmail.com>.
Yes

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266726.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: flush_largest_memtables_at messages in 7.4

Posted by Peter Schuller <pe...@infidyne.com>.
> Heap is 0.7802529021498031 full. You may need to reduce memtable and/or
> cache sizes Cassandra will now flush up to the two largest memtables to free
> up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if
> you don't want Cassandra to do this automatically
>
> How do I verify that I need to adjust any thresholds? And how to calculate
> correct value?

Is this on the same cluster/nodes that you're doing your 250k column
stresses (the other thread)?

In any case, for typical cases there is:

   http://www.datastax.com/docs/0.7/operations/tuning

-- 
/ Peter Schuller