You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dikang Gu <di...@gmail.com> on 2016/10/28 17:03:56 UTC

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster
to 2.1.14 as well. The 2.1.14 node is not able to handle the same amount of
read traffic as the 2.0.16 node, actually, it's less than 50%.

And in the perf results, the first line could go as high as 50%, as we turn
up the read traffic, which never appeared in 2.0.16.

Any thoughts?
Thanks


Samples: 952K of event 'cycles', Event count (approx.): 229681774560
Overhead  Shared Object                          Symbol
   6.52%  perf-196410.map                        [.]
Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   4.84%  libzip.so                              [.] adler32
   2.88%  perf-196410.map                        [.]
Ljava/nio/HeapByteBuffer;.get in
Lorg/apache/cassandra/db/marshal/IntegerType;.compare
   2.39%  perf-196410.map                        [.]
Ljava/nio/Buffer;.checkIndex in
Lorg/apache/cassandra/db/marshal/IntegerType;.findMostSignificantByte
   2.03%  perf-196410.map                        [.]
Ljava/math/BigInteger;.compareTo in
Lorg/apache/cassandra/db/DecoratedKey;.compareTo
   1.65%  perf-196410.map                        [.] vtable chunks
   1.44%  perf-196410.map                        [.]
Lorg/apache/cassandra/db/DecoratedKey;.compareTo in
Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
   1.02%  perf-196410.map                        [.]
Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   1.00%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x0000000000003804
   0.87%  perf-196410.map                        [.]
Ljava/io/DataInputStream;.readFully in
Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
   0.82%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036dc
   0.79%  [kernel]                               [k]
copy_user_generic_string
   0.73%  perf-196410.map                        [.] vtable chunks
   0.71%  perf-196410.map                        [.]
Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable in
Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
   0.70%  [kernel]                               [k] find_busiest_group
   0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
   0.68%  perf-196410.map                        [.]
Lorg/apache/cassandra/db/DecoratedKey;.compareTo
   0.65%  perf-196410.map                        [.]
jbyte_disjoint_arraycopy
   0.64%  [kernel]                               [k] _raw_spin_lock
   0.63%  [kernel]                               [k] __schedule
   0.45%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036df

On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <co...@bestdata.be> wrote:

> @JC, Get the pid of your target java process (something like "ps -ef |
> grep -i cassandra") .
> Then do a kill -3 <pid> (at unix/linux)
> Check the stdout logfile of the process.
>  it should contain the threaddump.
> If you found it, then great!
> Let that kill -3 loop for about 2 or 3 minutes.
> Herafter copy paste and load the stdout file into one if the mentioned
> tools.
> If you are not familiar with the java internals, then those threaddumps
> will learn you a lot:)
>
>
>
>
> Op vrijdag 29 januari 2016 heeft Jean Carlo <je...@gmail.com>
> het volgende geschreven:
>
>> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
>> am not good on jvm so I would like to know how to do what @
>> CorryOpdenakker  propose with cassandra.
>>
>> :)
>>
>> I check concurrent_compactors
>>
>>
>> Saludos
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
>> wrote:
>>
>>> Hi guys,
>>> Cassandra is still new for me, but I have a lot of java tuning
>>> experience.
>>>
>>> For root cause detection of performance degradations its always good to
>>> start with collecting a series of java thread dumps. Take at problem
>>> occurrence using a loopscript for example 60 thread dumps with an interval
>>> of 1 or 2 seconds.
>>> Then load those dumps into IBM thread dump analyzer or in "eclipse mat"
>>> or any similar tool and see which methods appear to be most active or
>>> blocking others.
>>>
>>> Its really very useful
>>>
>>> Same can be be done in a normal situation to compare the difference.
>>>
>>> That should give more insights.
>>>
>>> Cheers, Corry
>>>
>>>
>>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het
>>> volgende geschreven:
>>>
>>>> Hello,
>>>> We have another update on performance on 2.1.11. compression_chunk_size
>>>>  didn’t really help much but We changed concurrent_compactors from default
>>>> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
>>>> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
>>>> metric that could affect read latencies is that 2.1.11 is running
>>>> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
>>>> 2.0.9 even though our read_repair_chance is same on both. Could anyone
>>>> shed some light on why 2.1.11 could be running read repair 10 to 50 times
>>>> more in spite of same configuration on both clusters?
>>>>
>>>> dclocal_read_repair_chance=0.100000 AND
>>>> read_repair_chance=0.000000 AND
>>>>
>>>> Here is the table for read repair metrics for both clusters.
>>>> 2.0.9 2.1.11
>>>> ReadRepairedBackground 5MinAvg 0.006 0.1
>>>> 15MinAvg 0.009 0.153
>>>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>>>> 15MinAvg 0.007 0.91
>>>>
>>>> Thanks
>>>> Praveen
>>>>
>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>> Reply-To: <us...@cassandra.apache.org>
>>>> Date: Thursday, January 14, 2016 at 2:58 PM
>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>
>>>> Sorry I wasn’t as explicit as I should have been
>>>>
>>>> The same buffer size is used by compressed reads as well, but tuned
>>>> with compression_chunk_size table property. It’s likely true that if you
>>>> lower compression_chunk_size, you’ll see improved read performance.
>>>>
>>>> This was covered in the AWS re:Invent youtube link I sent in my
>>>> original reply.
>>>>
>>>>
>>>>
>>>> From: "Peddi, Praveen"
>>>> Reply-To: "user@cassandra.apache.org"
>>>> Date: Thursday, January 14, 2016 at 11:36 AM
>>>> To: "user@cassandra.apache.org", Zhiyan Shao
>>>> Cc: "Agrawal, Pratik"
>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>
>>>> Hi,
>>>> We will try with reduced “rar_buffer_size” to 4KB. However
>>>> CASSANDRA-10249 <https://issues.apache.org/jira/browse/CASSANDRA-10249> says
>>>> "this only affects users who have 1. disabled compression, 2. switched to
>>>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>>>> default disk_access_mode which should be mmap. We also used
>>>> LZ4Compressor when created table.
>>>>
>>>> We will let you know if this property had any effect. We were testing
>>>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>>>> latest version.
>>>>
>>>> Praveen
>>>>
>>>>
>>>>
>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>> Reply-To: <us...@cassandra.apache.org>
>>>> Date: Thursday, January 14, 2016 at 1:29 PM
>>>> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org" <
>>>> user@cassandra.apache.org>
>>>> Cc: "Agrawal, Pratik" <pa...@amazon.com>
>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>
>>>> This may be due to https://issues.apache.org/
>>>> jira/browse/CASSANDRA-10249 / https://issues.apache.org/
>>>> jira/browse/CASSANDRA-8894 - whether or not this is really the case
>>>> depends on how much of your data is in page cache, and whether or not
>>>> you’re using mmap. Since the original question was asked by someone using
>>>> small RAM instances, it’s possible.
>>>>
>>>> We mitigate this by dropping compression_chunk_size in order to force a
>>>> smaller buffer on reads, so we don’t over read very small blocks. This has
>>>> other side effects (lower compression ratio, more garbage during
>>>> streaming), but significantly speeds up read workloads for us.
>>>>
>>>>
>>>> From: Zhiyan Shao
>>>> Date: Thursday, January 14, 2016 at 9:49 AM
>>>> To: "user@cassandra.apache.org"
>>>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>
>>>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>>>> forum, you can find another thread I sent a while ago. The perf test I did
>>>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>>>> 2.0.14.
>>>>
>>>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com>
>>>> wrote:
>>>>
>>>>> Thanks Jeff for your reply. Sorry for delayed response. We were
>>>>> running some more tests and wanted to wait for the results.
>>>>>
>>>>> So basically we saw higher CPU with 2.1.11 was higher compared to
>>>>> 2.0.9 (see below) for the same exact load test. Memory spikes were also
>>>>> aggressive on 2.1.11.
>>>>>
>>>>> So we wanted to rule out any of our custom setting so we ended up
>>>>> doing some testing with Cassandra stress test and default Cassandra
>>>>> installation. Here are the results we saw between 2.0.9 and 2.1.11. Both
>>>>> are default installations and both use Cassandra stress test with same
>>>>> params. This is the closest apple-apple comparison we can get. As you can
>>>>> see both read and write latencies are 30 to 50% worse in 2.1.11 than 2.0.9.
>>>>> Since we are using default installation.
>>>>>
>>>>> *Highlights of the test:*
>>>>> Load: 2x reads and 1x writes
>>>>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>>>>>
>>>>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>>>>>
>>>>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>>>>>
>>>>> *One observation is, As the number of threads are increased, 2.1.11
>>>>> read latencies are getting worse compared to 2.0.9 (see below table for 24
>>>>> threads vs 54 threads)*
>>>>> Not sure if anyone has done this kind of comparison before and what
>>>>> their thoughts are. I am thinking for this same reason
>>>>>
>>>>> 2.0.9 Plain  type       total ops     op/s     pk/s    row/s     mean
>>>>> med 0.95 0.99 0.999      max    time
>>>>>  16 threadCount  READ 66854 7205 7205 7205 1.6 1.3 2.8 3.5 9.6 85.3
>>>>> 9.3
>>>>>  16 threadCount  WRITE 33146 3572 3572 3572 1.3 1 2.6 3.3 7 206.5 9.3
>>>>>  16 threadCount  total 100000 10777 10777 10777 1.5 1.3 2.7 3.4 7.9
>>>>> 206.5 9.3
>>>>> 2.1.11 Plain
>>>>>  16 threadCount  READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7
>>>>> 9.8
>>>>>  16 threadCount  WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7 9.8
>>>>>  16 threadCount  total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6 61.7
>>>>> 9.8
>>>>> 2.0.9 Plain
>>>>>  24 threadCount  READ 66414 8167 8167 8167 2 1.6 3.7 7.5 16.7 208 8.1
>>>>>  24 threadCount  WRITE 33586 4130 4130 4130 1.7 1.3 3.4 5.4 25.6 45.4
>>>>> 8.1
>>>>>  24 threadCount  total 100000 12297 12297 12297 1.9 1.5 3.5 6.2 15.2
>>>>> 208 8.1
>>>>> 2.1.11 Plain
>>>>>  24 threadCount  READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3 9
>>>>>  24 threadCount  WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2 9
>>>>>  24 threadCount  total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8 38.3
>>>>> 9
>>>>> 2.0.9 Plain
>>>>>  54 threadCount  READ 67115 13419 13419 13419 2.8 2.6 4.2 6.4 36.9
>>>>> 82.4 5
>>>>>  54 threadCount  WRITE 32885 6575 6575 6575 2.5 2.3 3.9 5.6 15.9 81.5
>>>>> 5
>>>>>  54 threadCount  total 100000 19993 19993 19993 2.7 2.5 4.1 5.7 13.9
>>>>> 82.4 5
>>>>> 2.1.11 Plain
>>>>>  54 threadCount  READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9
>>>>> 7.5
>>>>>  54 threadCount  WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68
>>>>> 7.5
>>>>>  54 threadCount  total 100000 13404 13404 13404 4 3.7 6.6 9.2 48 69.9
>>>>> 7.5
>>>>>
>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>> Date: Thursday, January 7, 2016 at 1:01 AM
>>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
>>>>> Praveen <pe...@amazon.com>
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>>>>> experience was anywhere from 20-60%, depending on workload).
>>>>>
>>>>> However, it’s not necessarily true that everything behaves exactly the
>>>>> same – in particular, memtables are different, commitlog segment handling
>>>>> is different, and GC params may need to be tuned differently for 2.1 than
>>>>> 2.0.
>>>>>
>>>>> When the system is busy, what’s it actually DOING? Cassandra exposes a
>>>>> TON of metrics – have you plugged any into a reporting system to see what’s
>>>>> going on? Is your latency due to pegged cpu, iowait/disk queues or gc
>>>>> pauses?
>>>>>
>>>>> My colleagues spent a lot of time validating different AWS EBS configs
>>>>> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4),
>>>>> 2.1 was faster in almost every case, but you’re using an instance size I
>>>>> don’t believe we tried (too little RAM to be viable in production).  c3.2xl
>>>>> only gives you 15G of ram – most “performance” based systems want 2-4x that
>>>>> (people running G1 heaps usually start at 16G heaps and leave another
>>>>> 16-30G for page cache), you’re running fairly small hardware – it’s
>>>>> possible that 2.1 isn’t “as good” on smaller hardware.
>>>>>
>>>>> (I do see your domain, presumably you know all of this, but just to be
>>>>> sure):
>>>>>
>>>>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>>>>> Which volume sizes? Are they the same between versions? Are you hitting
>>>>> your iops limits? Running out of burst tokens? Do you have enhanced
>>>>> networking enabled? At load, what part of your system is stressed? Are you
>>>>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>>>>> memtable_allocation_type -> offheap objects  (available in 2.1, not in
>>>>> 2.0)?
>>>>>
>>>>> Tuning gc_grace is weird – do you understand what it does? Are you
>>>>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
>>>>> you doing a lot of compaction?
>>>>>
>>>>>
>>>>> From: "Peddi, Praveen"
>>>>> Reply-To: "user@cassandra.apache.org"
>>>>> Date: Wednesday, January 6, 2016 at 11:41 AM
>>>>> To: "user@cassandra.apache.org"
>>>>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Hi,
>>>>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>>>>> environment with pretty much same yaml settings in both (removed unused
>>>>> yaml settings and renamed few others) and we have noticed performance on
>>>>> 2.1.11 is worse compared to 2.0.9. *After more investigation we found
>>>>> that the performance gets worse as we increase replication factor on 2.1.11
>>>>> where as on 2.0.9 performance is more or less same.* Has anything
>>>>> architecturally changed as far as replication is concerned in 2.1.11?
>>>>>
>>>>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we
>>>>> are obviously doing something different. However the client code, load test
>>>>> is all identical in both cases.
>>>>>
>>>>> Details:
>>>>> Nodes: 3 ec2 c3.2x large
>>>>> R/W Consistency: QUORUM
>>>>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>>>>> removed unused properties from yaml file.
>>>>> We run compaction aggressive compaction with low gc_grace (15 mins)
>>>>> but this is true for both 2.0.9 and 2.1.11.
>>>>>
>>>>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>>>>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>>>>> latencies almost doubled (especially reads are much slower than writes).
>>>>>
>>>>> # Nodes  RF # of rows 2.0.9 2.1.11
>>>>> READ
>>>>>       P50 P90 P99 P50 P90 P99
>>>>> 3 1 450 306 594 747 425 849 1085
>>>>> 3 3 450 358 634 877 708 1274 2642
>>>>>
>>>>> WRITE
>>>>> 3 1 10 26 80 179 37 131 196
>>>>> 3 3 10 31 96 184 46 166 468
>>>>> Any pointers on how to debug performance issues will be appreciated.
>>>>>
>>>>> Praveen
>>>>>
>>>>
>>>>
>>>
>>> --
>>> ----------------------------------
>>> Bestdata.be
>>> Optimised ict
>>> Tel:+32(0)496609576
>>> Corry@BestData.be
>>> ----------------------------------
>>>
>>>
>>
>
> --
> ----------------------------------
> Bestdata.be
> Optimised ict
> Tel:+32(0)496609576
> Corry@BestData.be
> ----------------------------------
>
>


-- 
Dikang

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by "techpyaasa ." <te...@gmail.com>.
Hi guys,

Has anybody got fix for this issue?
Recently we upgraded our c* cluster from 2.0.17 to 2.1.17. And we saw
increase in read latency on few tables, read latency increased almost 2 to
3 times more than what it was in 2.0.17.

Kindly let me know the fix for it if anybody knows it.

Thanks
TechPyaasa

On Wed, Nov 9, 2016 at 1:28 AM, Dikang Gu <di...@gmail.com> wrote:

> Michael, thanks for the info. It sounds to me a very serious performance
> regression. :(
>
> On Tue, Nov 8, 2016 at 11:39 AM, Michael Kjellman <
> mkjellman@internalcircle.com> wrote:
>
>> Yes, We hit this as well. We have a internal patch that I wrote to mostly
>> revert the behavior back to ByteBuffers with as small amount of code change
>> as possible. Performance of our build is now even with 2.0.x and we've also
>> forward ported it to 3.x (although the 3.x patch was even more complicated
>> due to Bounds, RangeTombstoneBound, ClusteringPrefix which actually
>> increases the number of allocations to somewhere between 11 and 13
>> depending on how I count it per indexed block -- making it even worse than
>> what you're observing in 2.1.
>>
>> We haven't upstreamed it as 2.1 is obviously not taking any changes at
>> this point and the longer term solution is https://issues.apache.org/jira
>> /browse/CASSANDRA-9754 (which also includes the changes to go back to
>> ByteBuffers and remove as much of the Composites from the storage engine as
>> possible.) Also, the solution is a bit of a hack -- although it was a
>> blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if it
>> works..
>>
>> best,
>> kjellman
>>
>>
>> On Nov 8, 2016, at 11:31 AM, Dikang Gu <dikang85@gmail.com<mailto:dik
>> ang85@gmail.com>> wrote:
>>
>> This is very expensive:
>>
>> "MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10
>> tid=0x00007f2fd57e1800 nid=0x1cc510 runnable [0x00007f2b971b0000]
>>    java.lang.Thread.State: RUNNABLE
>>         at org.apache.cassandra.db.marshal.IntegerType.compare(IntegerT
>> ype.java:29)
>>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameTyp
>> e.compare(AbstractSimpleCellNameType.java:98)
>>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameTyp
>> e.compare(AbstractSimpleCellNameType.java:31)
>>         at java.util.TreeMap.put(TreeMap.java:545)
>>         at java.util.TreeSet.add(TreeSet.java:255)
>>         at org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.
>> deserialize(NamesQueryFilter.java:254)
>>         at org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.
>> deserialize(NamesQueryFilter.java:228)
>>         at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.
>> deserialize(SliceByNamesReadCommand.java:104)
>>         at org.apache.cassandra.db.ReadCommandSerializer.deserialize(
>> ReadCommand.java:156)
>>         at org.apache.cassandra.db.ReadCommandSerializer.deserialize(
>> ReadCommand.java:132)
>>         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
>>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessag
>> e(IncomingTcpConnection.java:195)
>>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessag
>> es(IncomingTcpConnection.java:172)
>>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingT
>> cpConnection.java:88)
>>
>>
>> Checked the git history, it comes from this jira:
>> https://issues.apache.org/jira/browse/CASSANDRA-5417
>>
>> Any thoughts?
>> ​
>>
>> On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta <pauloricardomg@gmail.com
>> <ma...@gmail.com>> wrote:
>> Haven't seen this before, but perhaps it's related to CASSANDRA-10433?
>> This is just a wild guess as it's in a related codepath, but maybe worth
>> trying out the patch available to see if it helps anything...
>>
>> 2016-10-28 15:03 GMT-02:00 Dikang Gu <dikang85@gmail.com<mailto:dik
>> ang85@gmail.com>>:
>> We are seeing huge cpu regression when upgrading one of our 2.0.16
>> cluster to 2.1.14 as well. The 2.1.14 node is not able to handle the same
>> amount of read traffic as the 2.0.16 node, actually, it's less than 50%.
>>
>> And in the perf results, the first line could go as high as 50%, as we
>> turn up the read traffic, which never appeared in 2.0.16.
>>
>> Any thoughts?
>> Thanks
>>
>>
>> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
>> Overhead  Shared Object                          Symbol
>>    6.52%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    4.84%  libzip.so                              [.] adler32
>>    2.88%  perf-196410.map                        [.]
>> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.compare
>>    2.39%  perf-196410.map                        [.]
>> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.findMostSignificantByte
>>    2.03%  perf-196410.map                        [.]
>> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/Decor
>> atedKey;.compareTo
>>    1.65%  perf-196410.map                        [.] vtable chunks
>>    1.44%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in
>> Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
>>    1.02%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    1.00%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsna
>> ppyjava.so/>        [.] 0x0000000000003804
>>    0.87%  perf-196410.map                        [.]
>> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/Abstr
>> actCell$1;.computeNext
>>    0.82%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsna
>> ppyjava.so/>        [.] 0x00000000000036dc
>>    0.79%  [kernel]                               [k]
>> copy_user_generic_string
>>    0.73%  perf-196410.map                        [.] vtable chunks
>>    0.71%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable
>> in Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>>    0.70%  [kernel]                               [k] find_busiest_group
>>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>>    0.68%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>>    0.65%  perf-196410.map                        [.]
>> jbyte_disjoint_arraycopy
>>    0.64%  [kernel]                               [k] _raw_spin_lock
>>    0.63%  [kernel]                               [k] __schedule
>>    0.45%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsna
>> ppyjava.so/>        [.] 0x00000000000036df
>>
>> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <corry@bestdata.be
>> <ma...@bestdata.be>> wrote:
>> @JC, Get the pid of your target java process (something like "ps -ef |
>> grep -i cassandra") .
>> Then do a kill -3 <pid> (at unix/linux)
>> Check the stdout logfile of the process.
>>  it should contain the threaddump.
>> If you found it, then great!
>> Let that kill -3 loop for about 2 or 3 minutes.
>> Herafter copy paste and load the stdout file into one if the mentioned
>> tools.
>> If you are not familiar with the java internals, then those threaddumps
>> will learn you a lot:)
>>
>>
>>
>>
>> Op vrijdag 29 januari 2016 heeft Jean Carlo <jean.jeancarl48@gmail.com
>> <ma...@gmail.com>> het volgende geschreven:
>> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
>> am not good on jvm so I would like to know how to do what @CorryOpdenakker
>> propose with cassandra.
>>
>> :)
>>
>> I check concurrent_compactors
>>
>>
>> Saludos
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
>> wrote:
>> Hi guys,
>> Cassandra is still new for me, but I have a lot of java tuning experience.
>>
>> For root cause detection of performance degradations its always good to
>> start with collecting a series of java thread dumps. Take at problem
>> occurrence using a loopscript for example 60 thread dumps with an interval
>> of 1 or 2 seconds.
>> Then load those dumps into IBM thread dump analyzer or in "eclipse mat"
>> or any similar tool and see which methods appear to be most active or
>> blocking others.
>>
>> Its really very useful
>>
>> Same can be be done in a normal situation to compare the difference.
>>
>> That should give more insights.
>>
>> Cheers, Corry
>>
>>
>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het
>> volgende geschreven:
>> Hello,
>> We have another update on performance on 2.1.11. compression_chunk_size
>> didn’t really help much but We changed concurrent_compactors from default
>> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
>> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
>> metric that could affect read latencies is that 2.1.11 is running
>> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
>> 2.0.9 even though our read_repair_chance is same on both. Could anyone shed
>> some light on why 2.1.11 could be running read repair 10 to 50 times more
>> in spite of same configuration on both clusters?
>>
>> dclocal_read_repair_chance=0.100000 AND
>> read_repair_chance=0.000000 AND
>>
>> Here is the table for read repair metrics for both clusters.
>>                 2.0.9   2.1.11
>> ReadRepairedBackground  5MinAvg 0.006   0.1
>>         15MinAvg        0.009   0.153
>> ReadRepairedBlocking    5MinAvg 0.002   0.55
>>         15MinAvg        0.007   0.91
>>
>> Thanks
>> Praveen
>>
>> From: Jeff Jirsa <je...@crowdstrike.com>
>> Reply-To: <us...@cassandra.apache.org>
>> Date: Thursday, January 14, 2016 at 2:58 PM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Sorry I wasn’t as explicit as I should have been
>>
>> The same buffer size is used by compressed reads as well, but tuned with
>> compression_chunk_size table property. It’s likely true that if you lower
>> compression_chunk_size, you’ll see improved read performance.
>>
>> This was covered in the AWS re:Invent youtube link I sent in my original
>> reply.
>>
>>
>>
>> From: "Peddi, Praveen"
>> Reply-To: "user@cassandra.apache.org"
>> Date: Thursday, January 14, 2016 at 11:36 AM
>> To: "user@cassandra.apache.org", Zhiyan Shao
>> Cc: "Agrawal, Pratik"
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Hi,
>> We will try with reduced “rar_buffer_size” to 4KB. However
>> CASSANDRA-10249<https://issues.apache.org/jira/browse/CASSANDRA-10249>
>> says "this only affects users who have 1. disabled compression, 2. switched
>> to buffered i/o from mmap’d”. None of this is true for us I believe. We use
>> default disk_access_mode which should be mmap. We also used LZ4Compressor
>> when created table.
>>
>>
>> We will let you know if this property had any effect. We were testing
>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>> latest version.
>>
>> Praveen
>>
>>
>>
>>
>>
>> From: Jeff Jirsa <je...@crowdstrike.com>
>> Reply-To: <us...@cassandra.apache.org>
>> Date: Thursday, January 14, 2016 at 1:29 PM
>> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org" <
>> user@cassandra.apache.org>
>> Cc: "Agrawal, Pratik" <pa...@amazon.com>
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
>> / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not
>> this is really the case depends on how much of your data is in page cache,
>> and whether or not you’re using mmap. Since the original question was asked
>> by someone using small RAM instances, it’s possible.
>>
>> We mitigate this by dropping compression_chunk_size in order to force a
>> smaller buffer on reads, so we don’t over read very small blocks. This has
>> other side effects (lower compression ratio, more garbage during
>> streaming), but significantly speeds up read workloads for us.
>>
>>
>> From: Zhiyan Shao
>> Date: Thursday, January 14, 2016 at 9:49 AM
>> To: "user@cassandra.apache.org"
>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>> forum, you can find another thread I sent a while ago. The perf test I did
>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>> 2.0.14.
>>
>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com> wrote:
>> Thanks Jeff for your reply. Sorry for delayed response. We were running
>> some more tests and wanted to wait for the results.
>>
>> So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9
>> (see below) for the same exact load test. Memory spikes were also
>> aggressive on 2.1.11.
>>
>> So we wanted to rule out any of our custom setting so we ended up doing
>> some testing with Cassandra stress test and default Cassandra installation.
>> Here are the results we saw between 2.0.9 and 2.1.11. Both are default
>> installations and both use Cassandra stress test with same params. This is
>> the closest apple-apple comparison we can get. As you can see both read and
>> write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are
>> using default installation.
>>
>> Highlights of the test:
>> Load: 2x reads and 1x writes
>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>> One observation is, As the number of threads are increased, 2.1.11 read
>> latencies are getting worse compared to 2.0.9 (see below table for 24
>> threads vs 54 threads)
>> Not sure if anyone has done this kind of comparison before and what their
>> thoughts are. I am thinking for this same reason
>>
>> 2.0.9 Plain      type         total ops     op/s            pk/s
>>  row/s            mean             med        0.95    0.99    0.999
>> max           time
>>  16 threadCount  READ   66854   7205    7205    7205    1.6     1.3
>>  2.8     3.5     9.6     85.3    9.3
>>  16 threadCount  WRITE  33146   3572    3572    3572    1.3     1
>>  2.6     3.3     7       206.5   9.3
>>  16 threadCount  total  100000  10777   10777   10777   1.5     1.3
>>  2.7     3.4     7.9     206.5   9.3
>> 2.1.11 Plain
>>  16 threadCount  READ   67096   6818    6818    6818    1.6     1.5
>>  2.6     3.5     7.9     61.7    9.8
>>  16 threadCount  WRITE  32904   3344    3344    3344    1.4     1.3
>>  2.3     3       6.5     56.7    9.8
>>  16 threadCount  total  100000  10162   10162   10162   1.6     1.4
>>  2.5     3.2     6       61.7    9.8
>> 2.0.9 Plain
>>  24 threadCount  READ   66414   8167    8167    8167    2       1.6
>>  3.7     7.5     16.7    208     8.1
>>  24 threadCount  WRITE  33586   4130    4130    4130    1.7     1.3
>>  3.4     5.4     25.6    45.4    8.1
>>  24 threadCount  total  100000  12297   12297   12297   1.9     1.5
>>  3.5     6.2     15.2    208     8.1
>> 2.1.11 Plain
>>  24 threadCount  READ   66628   7433    7433    7433    2.2     2.1
>>  3.4     4.3     8.4     38.3    9
>>  24 threadCount  WRITE  33372   3723    3723    3723    2       1.9
>>  3.1     3.8     21.9    37.2    9
>>  24 threadCount  total  100000  11155   11155   11155   2.1     2
>>  3.3     4.1     8.8     38.3    9
>> 2.0.9 Plain
>>  54 threadCount  READ   67115   13419   13419   13419   2.8     2.6
>>  4.2     6.4     36.9    82.4    5
>>  54 threadCount  WRITE  32885   6575    6575    6575    2.5     2.3
>>  3.9     5.6     15.9    81.5    5
>>  54 threadCount  total  100000  19993   19993   19993   2.7     2.5
>>  4.1     5.7     13.9    82.4    5
>> 2.1.11 Plain
>>  54 threadCount  READ   66780   8951    8951    8951    4.3     3.9
>>  6.8     9.7     49.4    69.9    7.5
>>  54 threadCount  WRITE  33220   4453    4453    4453    3.5     3.2
>>  5.7     8.2     36.8    68      7.5
>>  54 threadCount  total  100000  13404   13404   13404   4       3.7
>>  6.6     9.2     48      69.9    7.5
>>
>>
>> From: Jeff Jirsa <je...@crowdstrike.com>
>> Date: Thursday, January 7, 2016 at 1:01 AM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
>> Praveen <pe...@amazon.com>
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>> experience was anywhere from 20-60%, depending on workload).
>>
>> However, it’s not necessarily true that everything behaves exactly the
>> same – in particular, memtables are different, commitlog segment handling
>> is different, and GC params may need to be tuned differently for 2.1 than
>> 2.0.
>>
>> When the system is busy, what’s it actually DOING? Cassandra exposes a
>> TON of metrics – have you plugged any into a reporting system to see what’s
>> going on? Is your latency due to pegged cpu, iowait/disk queues or gc
>> pauses?
>>
>> My colleagues spent a lot of time validating different AWS EBS configs
>> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4),
>> 2.1 was faster in almost every case, but you’re using an instance size I
>> don’t believe we tried (too little RAM to be viable in production).  c3.2xl
>> only gives you 15G of ram – most “performance” based systems want 2-4x that
>> (people running G1 heaps usually start at 16G heaps and leave another
>> 16-30G for page cache), you’re running fairly small hardware – it’s
>> possible that 2.1 isn’t “as good” on smaller hardware.
>>
>> (I do see your domain, presumably you know all of this, but just to be
>> sure):
>>
>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>> Which volume sizes? Are they the same between versions? Are you hitting
>> your iops limits? Running out of burst tokens? Do you have enhanced
>> networking enabled? At load, what part of your system is stressed? Are you
>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>> memtable_allocation_type -> offheap objects  (available in 2.1, not in 2.0)?
>>
>> Tuning gc_grace is weird – do you understand what it does? Are you
>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
>> you doing a lot of compaction?
>>
>>
>> From: "Peddi, Praveen"
>> Reply-To: "user@cassandra.apache.org"
>> Date: Wednesday, January 6, 2016 at 11:41 AM
>> To: "user@cassandra.apache.org"
>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Hi,
>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>> environment with pretty much same yaml settings in both (removed unused
>> yaml settings and renamed few others) and we have noticed performance on
>> 2.1.11 is worse compared to 2.0.9. After more investigation we found that
>> the performance gets worse as we increase replication factor on 2.1.11
>> where as on 2.0.9 performance is more or less same. Has anything
>> architecturally changed as far as replication is concerned in 2.1.11?
>>
>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are
>> obviously doing something different. However the client code, load test is
>> all identical in both cases.
>>
>> Details:
>> Nodes: 3 ec2 c3.2x large
>> R/W Consistency: QUORUM
>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>> removed unused properties from yaml file.
>> We run compaction aggressive compaction with low gc_grace (15 mins) but
>> this is true for both 2.0.9 and 2.1.11.
>>
>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>> latencies almost doubled (especially reads are much slower than writes).
>>
>> # Nodes         RF      # of rows       2.0.9   2.1.11
>> READ
>>                         P50     P90     P99     P50     P90     P99
>> 3       1       450     306     594     747     425     849     1085
>> 3       3       450     358     634     877     708     1274    2642
>>
>> WRITE
>> 3       1       10      26      80      179     37      131     196
>> 3       3       10      31      96      184     46      166     468
>>
>> Any pointers on how to debug performance issues will be appreciated.
>>
>> Praveen
>>
>>
>>
>> --
>> ----------------------------------
>> Bestdata.be<http://Bestdata.be>
>> Optimised ict
>> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
>> Corry@BestData.be<ma...@BestData.be>
>> ----------------------------------
>>
>>
>>
>>
>> --
>> ----------------------------------
>> Bestdata.be<http://Bestdata.be>
>> Optimised ict
>> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
>> Corry@BestData.be<ma...@BestData.be>
>> ----------------------------------
>>
>>
>>
>>
>> --
>> Dikang
>>
>>
>>
>>
>>
>> --
>> Dikang
>>
>>
>>
>
>
> --
> Dikang
>
>

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by "techpyaasa ." <te...@gmail.com>.
Hi guys,

Has anybody got fix for this issue?
Recently we upgraded our c* cluster from 2.0.17 to 2.1.17. And we saw
increase in read latency on few tables, read latency increased almost 2 to
3 times more than what it was in 2.0.17.

Kindly let me know the fix for it if anybody knows it.

Thanks
TechPyaasa

On Wed, Nov 9, 2016 at 1:28 AM, Dikang Gu <di...@gmail.com> wrote:

> Michael, thanks for the info. It sounds to me a very serious performance
> regression. :(
>
> On Tue, Nov 8, 2016 at 11:39 AM, Michael Kjellman <
> mkjellman@internalcircle.com> wrote:
>
>> Yes, We hit this as well. We have a internal patch that I wrote to mostly
>> revert the behavior back to ByteBuffers with as small amount of code change
>> as possible. Performance of our build is now even with 2.0.x and we've also
>> forward ported it to 3.x (although the 3.x patch was even more complicated
>> due to Bounds, RangeTombstoneBound, ClusteringPrefix which actually
>> increases the number of allocations to somewhere between 11 and 13
>> depending on how I count it per indexed block -- making it even worse than
>> what you're observing in 2.1.
>>
>> We haven't upstreamed it as 2.1 is obviously not taking any changes at
>> this point and the longer term solution is https://issues.apache.org/jira
>> /browse/CASSANDRA-9754 (which also includes the changes to go back to
>> ByteBuffers and remove as much of the Composites from the storage engine as
>> possible.) Also, the solution is a bit of a hack -- although it was a
>> blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if it
>> works..
>>
>> best,
>> kjellman
>>
>>
>> On Nov 8, 2016, at 11:31 AM, Dikang Gu <dikang85@gmail.com<mailto:dik
>> ang85@gmail.com>> wrote:
>>
>> This is very expensive:
>>
>> "MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10
>> tid=0x00007f2fd57e1800 nid=0x1cc510 runnable [0x00007f2b971b0000]
>>    java.lang.Thread.State: RUNNABLE
>>         at org.apache.cassandra.db.marshal.IntegerType.compare(IntegerT
>> ype.java:29)
>>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameTyp
>> e.compare(AbstractSimpleCellNameType.java:98)
>>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameTyp
>> e.compare(AbstractSimpleCellNameType.java:31)
>>         at java.util.TreeMap.put(TreeMap.java:545)
>>         at java.util.TreeSet.add(TreeSet.java:255)
>>         at org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.
>> deserialize(NamesQueryFilter.java:254)
>>         at org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.
>> deserialize(NamesQueryFilter.java:228)
>>         at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.
>> deserialize(SliceByNamesReadCommand.java:104)
>>         at org.apache.cassandra.db.ReadCommandSerializer.deserialize(
>> ReadCommand.java:156)
>>         at org.apache.cassandra.db.ReadCommandSerializer.deserialize(
>> ReadCommand.java:132)
>>         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
>>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessag
>> e(IncomingTcpConnection.java:195)
>>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessag
>> es(IncomingTcpConnection.java:172)
>>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingT
>> cpConnection.java:88)
>>
>>
>> Checked the git history, it comes from this jira:
>> https://issues.apache.org/jira/browse/CASSANDRA-5417
>>
>> Any thoughts?
>> ​
>>
>> On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta <pauloricardomg@gmail.com
>> <ma...@gmail.com>> wrote:
>> Haven't seen this before, but perhaps it's related to CASSANDRA-10433?
>> This is just a wild guess as it's in a related codepath, but maybe worth
>> trying out the patch available to see if it helps anything...
>>
>> 2016-10-28 15:03 GMT-02:00 Dikang Gu <dikang85@gmail.com<mailto:dik
>> ang85@gmail.com>>:
>> We are seeing huge cpu regression when upgrading one of our 2.0.16
>> cluster to 2.1.14 as well. The 2.1.14 node is not able to handle the same
>> amount of read traffic as the 2.0.16 node, actually, it's less than 50%.
>>
>> And in the perf results, the first line could go as high as 50%, as we
>> turn up the read traffic, which never appeared in 2.0.16.
>>
>> Any thoughts?
>> Thanks
>>
>>
>> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
>> Overhead  Shared Object                          Symbol
>>    6.52%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    4.84%  libzip.so                              [.] adler32
>>    2.88%  perf-196410.map                        [.]
>> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.compare
>>    2.39%  perf-196410.map                        [.]
>> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.findMostSignificantByte
>>    2.03%  perf-196410.map                        [.]
>> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/Decor
>> atedKey;.compareTo
>>    1.65%  perf-196410.map                        [.] vtable chunks
>>    1.44%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in
>> Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
>>    1.02%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    1.00%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsna
>> ppyjava.so/>        [.] 0x0000000000003804
>>    0.87%  perf-196410.map                        [.]
>> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/Abstr
>> actCell$1;.computeNext
>>    0.82%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsna
>> ppyjava.so/>        [.] 0x00000000000036dc
>>    0.79%  [kernel]                               [k]
>> copy_user_generic_string
>>    0.73%  perf-196410.map                        [.] vtable chunks
>>    0.71%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable
>> in Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>>    0.70%  [kernel]                               [k] find_busiest_group
>>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>>    0.68%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>>    0.65%  perf-196410.map                        [.]
>> jbyte_disjoint_arraycopy
>>    0.64%  [kernel]                               [k] _raw_spin_lock
>>    0.63%  [kernel]                               [k] __schedule
>>    0.45%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsna
>> ppyjava.so/>        [.] 0x00000000000036df
>>
>> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <corry@bestdata.be
>> <ma...@bestdata.be>> wrote:
>> @JC, Get the pid of your target java process (something like "ps -ef |
>> grep -i cassandra") .
>> Then do a kill -3 <pid> (at unix/linux)
>> Check the stdout logfile of the process.
>>  it should contain the threaddump.
>> If you found it, then great!
>> Let that kill -3 loop for about 2 or 3 minutes.
>> Herafter copy paste and load the stdout file into one if the mentioned
>> tools.
>> If you are not familiar with the java internals, then those threaddumps
>> will learn you a lot:)
>>
>>
>>
>>
>> Op vrijdag 29 januari 2016 heeft Jean Carlo <jean.jeancarl48@gmail.com
>> <ma...@gmail.com>> het volgende geschreven:
>> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
>> am not good on jvm so I would like to know how to do what @CorryOpdenakker
>> propose with cassandra.
>>
>> :)
>>
>> I check concurrent_compactors
>>
>>
>> Saludos
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
>> wrote:
>> Hi guys,
>> Cassandra is still new for me, but I have a lot of java tuning experience.
>>
>> For root cause detection of performance degradations its always good to
>> start with collecting a series of java thread dumps. Take at problem
>> occurrence using a loopscript for example 60 thread dumps with an interval
>> of 1 or 2 seconds.
>> Then load those dumps into IBM thread dump analyzer or in "eclipse mat"
>> or any similar tool and see which methods appear to be most active or
>> blocking others.
>>
>> Its really very useful
>>
>> Same can be be done in a normal situation to compare the difference.
>>
>> That should give more insights.
>>
>> Cheers, Corry
>>
>>
>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het
>> volgende geschreven:
>> Hello,
>> We have another update on performance on 2.1.11. compression_chunk_size
>> didn’t really help much but We changed concurrent_compactors from default
>> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
>> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
>> metric that could affect read latencies is that 2.1.11 is running
>> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
>> 2.0.9 even though our read_repair_chance is same on both. Could anyone shed
>> some light on why 2.1.11 could be running read repair 10 to 50 times more
>> in spite of same configuration on both clusters?
>>
>> dclocal_read_repair_chance=0.100000 AND
>> read_repair_chance=0.000000 AND
>>
>> Here is the table for read repair metrics for both clusters.
>>                 2.0.9   2.1.11
>> ReadRepairedBackground  5MinAvg 0.006   0.1
>>         15MinAvg        0.009   0.153
>> ReadRepairedBlocking    5MinAvg 0.002   0.55
>>         15MinAvg        0.007   0.91
>>
>> Thanks
>> Praveen
>>
>> From: Jeff Jirsa <je...@crowdstrike.com>
>> Reply-To: <us...@cassandra.apache.org>
>> Date: Thursday, January 14, 2016 at 2:58 PM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Sorry I wasn’t as explicit as I should have been
>>
>> The same buffer size is used by compressed reads as well, but tuned with
>> compression_chunk_size table property. It’s likely true that if you lower
>> compression_chunk_size, you’ll see improved read performance.
>>
>> This was covered in the AWS re:Invent youtube link I sent in my original
>> reply.
>>
>>
>>
>> From: "Peddi, Praveen"
>> Reply-To: "user@cassandra.apache.org"
>> Date: Thursday, January 14, 2016 at 11:36 AM
>> To: "user@cassandra.apache.org", Zhiyan Shao
>> Cc: "Agrawal, Pratik"
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Hi,
>> We will try with reduced “rar_buffer_size” to 4KB. However
>> CASSANDRA-10249<https://issues.apache.org/jira/browse/CASSANDRA-10249>
>> says "this only affects users who have 1. disabled compression, 2. switched
>> to buffered i/o from mmap’d”. None of this is true for us I believe. We use
>> default disk_access_mode which should be mmap. We also used LZ4Compressor
>> when created table.
>>
>>
>> We will let you know if this property had any effect. We were testing
>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>> latest version.
>>
>> Praveen
>>
>>
>>
>>
>>
>> From: Jeff Jirsa <je...@crowdstrike.com>
>> Reply-To: <us...@cassandra.apache.org>
>> Date: Thursday, January 14, 2016 at 1:29 PM
>> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org" <
>> user@cassandra.apache.org>
>> Cc: "Agrawal, Pratik" <pa...@amazon.com>
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
>> / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not
>> this is really the case depends on how much of your data is in page cache,
>> and whether or not you’re using mmap. Since the original question was asked
>> by someone using small RAM instances, it’s possible.
>>
>> We mitigate this by dropping compression_chunk_size in order to force a
>> smaller buffer on reads, so we don’t over read very small blocks. This has
>> other side effects (lower compression ratio, more garbage during
>> streaming), but significantly speeds up read workloads for us.
>>
>>
>> From: Zhiyan Shao
>> Date: Thursday, January 14, 2016 at 9:49 AM
>> To: "user@cassandra.apache.org"
>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>> forum, you can find another thread I sent a while ago. The perf test I did
>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>> 2.0.14.
>>
>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com> wrote:
>> Thanks Jeff for your reply. Sorry for delayed response. We were running
>> some more tests and wanted to wait for the results.
>>
>> So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9
>> (see below) for the same exact load test. Memory spikes were also
>> aggressive on 2.1.11.
>>
>> So we wanted to rule out any of our custom setting so we ended up doing
>> some testing with Cassandra stress test and default Cassandra installation.
>> Here are the results we saw between 2.0.9 and 2.1.11. Both are default
>> installations and both use Cassandra stress test with same params. This is
>> the closest apple-apple comparison we can get. As you can see both read and
>> write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are
>> using default installation.
>>
>> Highlights of the test:
>> Load: 2x reads and 1x writes
>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>> One observation is, As the number of threads are increased, 2.1.11 read
>> latencies are getting worse compared to 2.0.9 (see below table for 24
>> threads vs 54 threads)
>> Not sure if anyone has done this kind of comparison before and what their
>> thoughts are. I am thinking for this same reason
>>
>> 2.0.9 Plain      type         total ops     op/s            pk/s
>>  row/s            mean             med        0.95    0.99    0.999
>> max           time
>>  16 threadCount  READ   66854   7205    7205    7205    1.6     1.3
>>  2.8     3.5     9.6     85.3    9.3
>>  16 threadCount  WRITE  33146   3572    3572    3572    1.3     1
>>  2.6     3.3     7       206.5   9.3
>>  16 threadCount  total  100000  10777   10777   10777   1.5     1.3
>>  2.7     3.4     7.9     206.5   9.3
>> 2.1.11 Plain
>>  16 threadCount  READ   67096   6818    6818    6818    1.6     1.5
>>  2.6     3.5     7.9     61.7    9.8
>>  16 threadCount  WRITE  32904   3344    3344    3344    1.4     1.3
>>  2.3     3       6.5     56.7    9.8
>>  16 threadCount  total  100000  10162   10162   10162   1.6     1.4
>>  2.5     3.2     6       61.7    9.8
>> 2.0.9 Plain
>>  24 threadCount  READ   66414   8167    8167    8167    2       1.6
>>  3.7     7.5     16.7    208     8.1
>>  24 threadCount  WRITE  33586   4130    4130    4130    1.7     1.3
>>  3.4     5.4     25.6    45.4    8.1
>>  24 threadCount  total  100000  12297   12297   12297   1.9     1.5
>>  3.5     6.2     15.2    208     8.1
>> 2.1.11 Plain
>>  24 threadCount  READ   66628   7433    7433    7433    2.2     2.1
>>  3.4     4.3     8.4     38.3    9
>>  24 threadCount  WRITE  33372   3723    3723    3723    2       1.9
>>  3.1     3.8     21.9    37.2    9
>>  24 threadCount  total  100000  11155   11155   11155   2.1     2
>>  3.3     4.1     8.8     38.3    9
>> 2.0.9 Plain
>>  54 threadCount  READ   67115   13419   13419   13419   2.8     2.6
>>  4.2     6.4     36.9    82.4    5
>>  54 threadCount  WRITE  32885   6575    6575    6575    2.5     2.3
>>  3.9     5.6     15.9    81.5    5
>>  54 threadCount  total  100000  19993   19993   19993   2.7     2.5
>>  4.1     5.7     13.9    82.4    5
>> 2.1.11 Plain
>>  54 threadCount  READ   66780   8951    8951    8951    4.3     3.9
>>  6.8     9.7     49.4    69.9    7.5
>>  54 threadCount  WRITE  33220   4453    4453    4453    3.5     3.2
>>  5.7     8.2     36.8    68      7.5
>>  54 threadCount  total  100000  13404   13404   13404   4       3.7
>>  6.6     9.2     48      69.9    7.5
>>
>>
>> From: Jeff Jirsa <je...@crowdstrike.com>
>> Date: Thursday, January 7, 2016 at 1:01 AM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
>> Praveen <pe...@amazon.com>
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>> experience was anywhere from 20-60%, depending on workload).
>>
>> However, it’s not necessarily true that everything behaves exactly the
>> same – in particular, memtables are different, commitlog segment handling
>> is different, and GC params may need to be tuned differently for 2.1 than
>> 2.0.
>>
>> When the system is busy, what’s it actually DOING? Cassandra exposes a
>> TON of metrics – have you plugged any into a reporting system to see what’s
>> going on? Is your latency due to pegged cpu, iowait/disk queues or gc
>> pauses?
>>
>> My colleagues spent a lot of time validating different AWS EBS configs
>> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4),
>> 2.1 was faster in almost every case, but you’re using an instance size I
>> don’t believe we tried (too little RAM to be viable in production).  c3.2xl
>> only gives you 15G of ram – most “performance” based systems want 2-4x that
>> (people running G1 heaps usually start at 16G heaps and leave another
>> 16-30G for page cache), you’re running fairly small hardware – it’s
>> possible that 2.1 isn’t “as good” on smaller hardware.
>>
>> (I do see your domain, presumably you know all of this, but just to be
>> sure):
>>
>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>> Which volume sizes? Are they the same between versions? Are you hitting
>> your iops limits? Running out of burst tokens? Do you have enhanced
>> networking enabled? At load, what part of your system is stressed? Are you
>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>> memtable_allocation_type -> offheap objects  (available in 2.1, not in 2.0)?
>>
>> Tuning gc_grace is weird – do you understand what it does? Are you
>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
>> you doing a lot of compaction?
>>
>>
>> From: "Peddi, Praveen"
>> Reply-To: "user@cassandra.apache.org"
>> Date: Wednesday, January 6, 2016 at 11:41 AM
>> To: "user@cassandra.apache.org"
>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Hi,
>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>> environment with pretty much same yaml settings in both (removed unused
>> yaml settings and renamed few others) and we have noticed performance on
>> 2.1.11 is worse compared to 2.0.9. After more investigation we found that
>> the performance gets worse as we increase replication factor on 2.1.11
>> where as on 2.0.9 performance is more or less same. Has anything
>> architecturally changed as far as replication is concerned in 2.1.11?
>>
>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are
>> obviously doing something different. However the client code, load test is
>> all identical in both cases.
>>
>> Details:
>> Nodes: 3 ec2 c3.2x large
>> R/W Consistency: QUORUM
>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>> removed unused properties from yaml file.
>> We run compaction aggressive compaction with low gc_grace (15 mins) but
>> this is true for both 2.0.9 and 2.1.11.
>>
>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>> latencies almost doubled (especially reads are much slower than writes).
>>
>> # Nodes         RF      # of rows       2.0.9   2.1.11
>> READ
>>                         P50     P90     P99     P50     P90     P99
>> 3       1       450     306     594     747     425     849     1085
>> 3       3       450     358     634     877     708     1274    2642
>>
>> WRITE
>> 3       1       10      26      80      179     37      131     196
>> 3       3       10      31      96      184     46      166     468
>>
>> Any pointers on how to debug performance issues will be appreciated.
>>
>> Praveen
>>
>>
>>
>> --
>> ----------------------------------
>> Bestdata.be<http://Bestdata.be>
>> Optimised ict
>> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
>> Corry@BestData.be<ma...@BestData.be>
>> ----------------------------------
>>
>>
>>
>>
>> --
>> ----------------------------------
>> Bestdata.be<http://Bestdata.be>
>> Optimised ict
>> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
>> Corry@BestData.be<ma...@BestData.be>
>> ----------------------------------
>>
>>
>>
>>
>> --
>> Dikang
>>
>>
>>
>>
>>
>> --
>> Dikang
>>
>>
>>
>
>
> --
> Dikang
>
>

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by Dikang Gu <di...@gmail.com>.
Michael, thanks for the info. It sounds to me a very serious performance
regression. :(

On Tue, Nov 8, 2016 at 11:39 AM, Michael Kjellman <
mkjellman@internalcircle.com> wrote:

> Yes, We hit this as well. We have a internal patch that I wrote to mostly
> revert the behavior back to ByteBuffers with as small amount of code change
> as possible. Performance of our build is now even with 2.0.x and we've also
> forward ported it to 3.x (although the 3.x patch was even more complicated
> due to Bounds, RangeTombstoneBound, ClusteringPrefix which actually
> increases the number of allocations to somewhere between 11 and 13
> depending on how I count it per indexed block -- making it even worse than
> what you're observing in 2.1.
>
> We haven't upstreamed it as 2.1 is obviously not taking any changes at
> this point and the longer term solution is https://issues.apache.org/
> jira/browse/CASSANDRA-9754 (which also includes the changes to go back to
> ByteBuffers and remove as much of the Composites from the storage engine as
> possible.) Also, the solution is a bit of a hack -- although it was a
> blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if it
> works..
>
> best,
> kjellman
>
>
> On Nov 8, 2016, at 11:31 AM, Dikang Gu <dikang85@gmail.com<mailto:dik
> ang85@gmail.com>> wrote:
>
> This is very expensive:
>
> "MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10
> tid=0x00007f2fd57e1800 nid=0x1cc510 runnable [0x00007f2b971b0000]
>    java.lang.Thread.State: RUNNABLE
>         at org.apache.cassandra.db.marshal.IntegerType.compare(
> IntegerType.java:29)
>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameType.
> compare(AbstractSimpleCellNameType.java:98)
>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameType.
> compare(AbstractSimpleCellNameType.java:31)
>         at java.util.TreeMap.put(TreeMap.java:545)
>         at java.util.TreeSet.add(TreeSet.java:255)
>         at org.apache.cassandra.db.filter.NamesQueryFilter$
> Serializer.deserialize(NamesQueryFilter.java:254)
>         at org.apache.cassandra.db.filter.NamesQueryFilter$
> Serializer.deserialize(NamesQueryFilter.java:228)
>         at org.apache.cassandra.db.SliceByNamesReadCommandSeriali
> zer.deserialize(SliceByNamesReadCommand.java:104)
>         at org.apache.cassandra.db.ReadCommandSerializer.
> deserialize(ReadCommand.java:156)
>         at org.apache.cassandra.db.ReadCommandSerializer.
> deserialize(ReadCommand.java:132)
>         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(
> IncomingTcpConnection.java:195)
>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(
> IncomingTcpConnection.java:172)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(
> IncomingTcpConnection.java:88)
>
>
> Checked the git history, it comes from this jira:
> https://issues.apache.org/jira/browse/CASSANDRA-5417
>
> Any thoughts?
> ​
>
> On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta <pauloricardomg@gmail.com<
> mailto:pauloricardomg@gmail.com>> wrote:
> Haven't seen this before, but perhaps it's related to CASSANDRA-10433?
> This is just a wild guess as it's in a related codepath, but maybe worth
> trying out the patch available to see if it helps anything...
>
> 2016-10-28 15:03 GMT-02:00 Dikang Gu <dikang85@gmail.com<mailto:dik
> ang85@gmail.com>>:
> We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster
> to 2.1.14 as well. The 2.1.14 node is not able to handle the same amount of
> read traffic as the 2.0.16 node, actually, it's less than 50%.
>
> And in the perf results, the first line could go as high as 50%, as we
> turn up the read traffic, which never appeared in 2.0.16.
>
> Any thoughts?
> Thanks
>
>
> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
> Overhead  Shared Object                          Symbol
>    6.52%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    4.84%  libzip.so                              [.] adler32
>    2.88%  perf-196410.map                        [.]
> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/
> marshal/IntegerType;.compare
>    2.39%  perf-196410.map                        [.]
> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/
> marshal/IntegerType;.findMostSignificantByte
>    2.03%  perf-196410.map                        [.]
> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/
> DecoratedKey;.compareTo
>    1.65%  perf-196410.map                        [.] vtable chunks
>    1.44%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in Ljava/util/concurrent/
> ConcurrentSkipListMap;.findNode
>    1.02%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    1.00%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-
> libsnappyjava.so/>        [.] 0x0000000000003804
>    0.87%  perf-196410.map                        [.]
> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/
> AbstractCell$1;.computeNext
>    0.82%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-
> libsnappyjava.so/>        [.] 0x00000000000036dc
>    0.79%  [kernel]                               [k]
> copy_user_generic_string
>    0.73%  perf-196410.map                        [.] vtable chunks
>    0.71%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable in
> Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>    0.70%  [kernel]                               [k] find_busiest_group
>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>    0.68%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>    0.65%  perf-196410.map                        [.]
> jbyte_disjoint_arraycopy
>    0.64%  [kernel]                               [k] _raw_spin_lock
>    0.63%  [kernel]                               [k] __schedule
>    0.45%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-
> libsnappyjava.so/>        [.] 0x00000000000036df
>
> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <corry@bestdata.be
> <ma...@bestdata.be>> wrote:
> @JC, Get the pid of your target java process (something like "ps -ef |
> grep -i cassandra") .
> Then do a kill -3 <pid> (at unix/linux)
> Check the stdout logfile of the process.
>  it should contain the threaddump.
> If you found it, then great!
> Let that kill -3 loop for about 2 or 3 minutes.
> Herafter copy paste and load the stdout file into one if the mentioned
> tools.
> If you are not familiar with the java internals, then those threaddumps
> will learn you a lot:)
>
>
>
>
> Op vrijdag 29 januari 2016 heeft Jean Carlo <jean.jeancarl48@gmail.com<
> mailto:jean.jeancarl48@gmail.com>> het volgende geschreven:
> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
> am not good on jvm so I would like to know how to do what @CorryOpdenakker
> propose with cassandra.
>
> :)
>
> I check concurrent_compactors
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
> wrote:
> Hi guys,
> Cassandra is still new for me, but I have a lot of java tuning experience.
>
> For root cause detection of performance degradations its always good to
> start with collecting a series of java thread dumps. Take at problem
> occurrence using a loopscript for example 60 thread dumps with an interval
> of 1 or 2 seconds.
> Then load those dumps into IBM thread dump analyzer or in "eclipse mat" or
> any similar tool and see which methods appear to be most active or blocking
> others.
>
> Its really very useful
>
> Same can be be done in a normal situation to compare the difference.
>
> That should give more insights.
>
> Cheers, Corry
>
>
> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het
> volgende geschreven:
> Hello,
> We have another update on performance on 2.1.11. compression_chunk_size
> didn’t really help much but We changed concurrent_compactors from default
> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
> metric that could affect read latencies is that 2.1.11 is running
> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
> 2.0.9 even though our read_repair_chance is same on both. Could anyone shed
> some light on why 2.1.11 could be running read repair 10 to 50 times more
> in spite of same configuration on both clusters?
>
> dclocal_read_repair_chance=0.100000 AND
> read_repair_chance=0.000000 AND
>
> Here is the table for read repair metrics for both clusters.
>                 2.0.9   2.1.11
> ReadRepairedBackground  5MinAvg 0.006   0.1
>         15MinAvg        0.009   0.153
> ReadRepairedBlocking    5MinAvg 0.002   0.55
>         15MinAvg        0.007   0.91
>
> Thanks
> Praveen
>
> From: Jeff Jirsa <je...@crowdstrike.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Thursday, January 14, 2016 at 2:58 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Sorry I wasn’t as explicit as I should have been
>
> The same buffer size is used by compressed reads as well, but tuned with
> compression_chunk_size table property. It’s likely true that if you lower
> compression_chunk_size, you’ll see improved read performance.
>
> This was covered in the AWS re:Invent youtube link I sent in my original
> reply.
>
>
>
> From: "Peddi, Praveen"
> Reply-To: "user@cassandra.apache.org"
> Date: Thursday, January 14, 2016 at 11:36 AM
> To: "user@cassandra.apache.org", Zhiyan Shao
> Cc: "Agrawal, Pratik"
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Hi,
> We will try with reduced “rar_buffer_size” to 4KB. However CASSANDRA-10249<
> https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this only
> affects users who have 1. disabled compression, 2. switched to buffered i/o
> from mmap’d”. None of this is true for us I believe. We use default
> disk_access_mode which should be mmap. We also used LZ4Compressor when
> created table.
>
> We will let you know if this property had any effect. We were testing with
> 2.1.11 and this was only fixed in 2.1.12 so we need to play with latest
> version.
>
> Praveen
>
>
>
>
>
> From: Jeff Jirsa <je...@crowdstrike.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Thursday, January 14, 2016 at 1:29 PM
> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Cc: "Agrawal, Pratik" <pa...@amazon.com>
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
> / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not
> this is really the case depends on how much of your data is in page cache,
> and whether or not you’re using mmap. Since the original question was asked
> by someone using small RAM instances, it’s possible.
>
> We mitigate this by dropping compression_chunk_size in order to force a
> smaller buffer on reads, so we don’t over read very small blocks. This has
> other side effects (lower compression ratio, more garbage during
> streaming), but significantly speeds up read workloads for us.
>
>
> From: Zhiyan Shao
> Date: Thursday, January 14, 2016 at 9:49 AM
> To: "user@cassandra.apache.org"
> Cc: Jeff Jirsa, "Agrawal, Pratik"
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
> forum, you can find another thread I sent a while ago. The perf test I did
> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
> 2.0.14.
>
> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com> wrote:
> Thanks Jeff for your reply. Sorry for delayed response. We were running
> some more tests and wanted to wait for the results.
>
> So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9
> (see below) for the same exact load test. Memory spikes were also
> aggressive on 2.1.11.
>
> So we wanted to rule out any of our custom setting so we ended up doing
> some testing with Cassandra stress test and default Cassandra installation.
> Here are the results we saw between 2.0.9 and 2.1.11. Both are default
> installations and both use Cassandra stress test with same params. This is
> the closest apple-apple comparison we can get. As you can see both read and
> write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are
> using default installation.
>
> Highlights of the test:
> Load: 2x reads and 1x writes
> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
> One observation is, As the number of threads are increased, 2.1.11 read
> latencies are getting worse compared to 2.0.9 (see below table for 24
> threads vs 54 threads)
> Not sure if anyone has done this kind of comparison before and what their
> thoughts are. I am thinking for this same reason
>
> 2.0.9 Plain      type         total ops     op/s            pk/s
>  row/s            mean             med        0.95    0.99    0.999
> max           time
>  16 threadCount  READ   66854   7205    7205    7205    1.6     1.3
>  2.8     3.5     9.6     85.3    9.3
>  16 threadCount  WRITE  33146   3572    3572    3572    1.3     1
>  2.6     3.3     7       206.5   9.3
>  16 threadCount  total  100000  10777   10777   10777   1.5     1.3
>  2.7     3.4     7.9     206.5   9.3
> 2.1.11 Plain
>  16 threadCount  READ   67096   6818    6818    6818    1.6     1.5
>  2.6     3.5     7.9     61.7    9.8
>  16 threadCount  WRITE  32904   3344    3344    3344    1.4     1.3
>  2.3     3       6.5     56.7    9.8
>  16 threadCount  total  100000  10162   10162   10162   1.6     1.4
>  2.5     3.2     6       61.7    9.8
> 2.0.9 Plain
>  24 threadCount  READ   66414   8167    8167    8167    2       1.6
>  3.7     7.5     16.7    208     8.1
>  24 threadCount  WRITE  33586   4130    4130    4130    1.7     1.3
>  3.4     5.4     25.6    45.4    8.1
>  24 threadCount  total  100000  12297   12297   12297   1.9     1.5
>  3.5     6.2     15.2    208     8.1
> 2.1.11 Plain
>  24 threadCount  READ   66628   7433    7433    7433    2.2     2.1
>  3.4     4.3     8.4     38.3    9
>  24 threadCount  WRITE  33372   3723    3723    3723    2       1.9
>  3.1     3.8     21.9    37.2    9
>  24 threadCount  total  100000  11155   11155   11155   2.1     2
>  3.3     4.1     8.8     38.3    9
> 2.0.9 Plain
>  54 threadCount  READ   67115   13419   13419   13419   2.8     2.6
>  4.2     6.4     36.9    82.4    5
>  54 threadCount  WRITE  32885   6575    6575    6575    2.5     2.3
>  3.9     5.6     15.9    81.5    5
>  54 threadCount  total  100000  19993   19993   19993   2.7     2.5
>  4.1     5.7     13.9    82.4    5
> 2.1.11 Plain
>  54 threadCount  READ   66780   8951    8951    8951    4.3     3.9
>  6.8     9.7     49.4    69.9    7.5
>  54 threadCount  WRITE  33220   4453    4453    4453    3.5     3.2
>  5.7     8.2     36.8    68      7.5
>  54 threadCount  total  100000  13404   13404   13404   4       3.7
>  6.6     9.2     48      69.9    7.5
>
>
> From: Jeff Jirsa <je...@crowdstrike.com>
> Date: Thursday, January 7, 2016 at 1:01 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
> Praveen <pe...@amazon.com>
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
> experience was anywhere from 20-60%, depending on workload).
>
> However, it’s not necessarily true that everything behaves exactly the
> same – in particular, memtables are different, commitlog segment handling
> is different, and GC params may need to be tuned differently for 2.1 than
> 2.0.
>
> When the system is busy, what’s it actually DOING? Cassandra exposes a TON
> of metrics – have you plugged any into a reporting system to see what’s
> going on? Is your latency due to pegged cpu, iowait/disk queues or gc
> pauses?
>
> My colleagues spent a lot of time validating different AWS EBS configs
> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), 2.1
> was faster in almost every case, but you’re using an instance size I don’t
> believe we tried (too little RAM to be viable in production).  c3.2xl only
> gives you 15G of ram – most “performance” based systems want 2-4x that
> (people running G1 heaps usually start at 16G heaps and leave another
> 16-30G for page cache), you’re running fairly small hardware – it’s
> possible that 2.1 isn’t “as good” on smaller hardware.
>
> (I do see your domain, presumably you know all of this, but just to be
> sure):
>
> You’re using c3, so presumably you’re using EBS – are you using GP2? Which
> volume sizes? Are they the same between versions? Are you hitting your iops
> limits? Running out of burst tokens? Do you have enhanced networking
> enabled? At load, what part of your system is stressed? Are you cpu bound?
> Are you seeing GC pauses hurt latency? Have you tried changing
> memtable_allocation_type -> offheap objects  (available in 2.1, not in 2.0)?
>
> Tuning gc_grace is weird – do you understand what it does? Are you
> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
> you doing a lot of compaction?
>
>
> From: "Peddi, Praveen"
> Reply-To: "user@cassandra.apache.org"
> Date: Wednesday, January 6, 2016 at 11:41 AM
> To: "user@cassandra.apache.org"
> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Hi,
> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
> environment with pretty much same yaml settings in both (removed unused
> yaml settings and renamed few others) and we have noticed performance on
> 2.1.11 is worse compared to 2.0.9. After more investigation we found that
> the performance gets worse as we increase replication factor on 2.1.11
> where as on 2.0.9 performance is more or less same. Has anything
> architecturally changed as far as replication is concerned in 2.1.11?
>
> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are
> obviously doing something different. However the client code, load test is
> all identical in both cases.
>
> Details:
> Nodes: 3 ec2 c3.2x large
> R/W Consistency: QUORUM
> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
> removed unused properties from yaml file.
> We run compaction aggressive compaction with low gc_grace (15 mins) but
> this is true for both 2.0.9 and 2.1.11.
>
> As you can see, all p50, p90 and p99 latencies stayed with in 10%
> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
> latencies almost doubled (especially reads are much slower than writes).
>
> # Nodes         RF      # of rows       2.0.9   2.1.11
> READ
>                         P50     P90     P99     P50     P90     P99
> 3       1       450     306     594     747     425     849     1085
> 3       3       450     358     634     877     708     1274    2642
>
> WRITE
> 3       1       10      26      80      179     37      131     196
> 3       3       10      31      96      184     46      166     468
>
> Any pointers on how to debug performance issues will be appreciated.
>
> Praveen
>
>
>
> --
> ----------------------------------
> Bestdata.be<http://Bestdata.be>
> Optimised ict
> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
> Corry@BestData.be<ma...@BestData.be>
> ----------------------------------
>
>
>
>
> --
> ----------------------------------
> Bestdata.be<http://Bestdata.be>
> Optimised ict
> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
> Corry@BestData.be<ma...@BestData.be>
> ----------------------------------
>
>
>
>
> --
> Dikang
>
>
>
>
>
> --
> Dikang
>
>
>


-- 
Dikang

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by Dikang Gu <di...@gmail.com>.
Michael, thanks for the info. It sounds to me a very serious performance
regression. :(

On Tue, Nov 8, 2016 at 11:39 AM, Michael Kjellman <
mkjellman@internalcircle.com> wrote:

> Yes, We hit this as well. We have a internal patch that I wrote to mostly
> revert the behavior back to ByteBuffers with as small amount of code change
> as possible. Performance of our build is now even with 2.0.x and we've also
> forward ported it to 3.x (although the 3.x patch was even more complicated
> due to Bounds, RangeTombstoneBound, ClusteringPrefix which actually
> increases the number of allocations to somewhere between 11 and 13
> depending on how I count it per indexed block -- making it even worse than
> what you're observing in 2.1.
>
> We haven't upstreamed it as 2.1 is obviously not taking any changes at
> this point and the longer term solution is https://issues.apache.org/
> jira/browse/CASSANDRA-9754 (which also includes the changes to go back to
> ByteBuffers and remove as much of the Composites from the storage engine as
> possible.) Also, the solution is a bit of a hack -- although it was a
> blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if it
> works..
>
> best,
> kjellman
>
>
> On Nov 8, 2016, at 11:31 AM, Dikang Gu <dikang85@gmail.com<mailto:dik
> ang85@gmail.com>> wrote:
>
> This is very expensive:
>
> "MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10
> tid=0x00007f2fd57e1800 nid=0x1cc510 runnable [0x00007f2b971b0000]
>    java.lang.Thread.State: RUNNABLE
>         at org.apache.cassandra.db.marshal.IntegerType.compare(
> IntegerType.java:29)
>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameType.
> compare(AbstractSimpleCellNameType.java:98)
>         at org.apache.cassandra.db.composites.AbstractSimpleCellNameType.
> compare(AbstractSimpleCellNameType.java:31)
>         at java.util.TreeMap.put(TreeMap.java:545)
>         at java.util.TreeSet.add(TreeSet.java:255)
>         at org.apache.cassandra.db.filter.NamesQueryFilter$
> Serializer.deserialize(NamesQueryFilter.java:254)
>         at org.apache.cassandra.db.filter.NamesQueryFilter$
> Serializer.deserialize(NamesQueryFilter.java:228)
>         at org.apache.cassandra.db.SliceByNamesReadCommandSeriali
> zer.deserialize(SliceByNamesReadCommand.java:104)
>         at org.apache.cassandra.db.ReadCommandSerializer.
> deserialize(ReadCommand.java:156)
>         at org.apache.cassandra.db.ReadCommandSerializer.
> deserialize(ReadCommand.java:132)
>         at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(
> IncomingTcpConnection.java:195)
>         at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(
> IncomingTcpConnection.java:172)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(
> IncomingTcpConnection.java:88)
>
>
> Checked the git history, it comes from this jira:
> https://issues.apache.org/jira/browse/CASSANDRA-5417
>
> Any thoughts?
> ​
>
> On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta <pauloricardomg@gmail.com<
> mailto:pauloricardomg@gmail.com>> wrote:
> Haven't seen this before, but perhaps it's related to CASSANDRA-10433?
> This is just a wild guess as it's in a related codepath, but maybe worth
> trying out the patch available to see if it helps anything...
>
> 2016-10-28 15:03 GMT-02:00 Dikang Gu <dikang85@gmail.com<mailto:dik
> ang85@gmail.com>>:
> We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster
> to 2.1.14 as well. The 2.1.14 node is not able to handle the same amount of
> read traffic as the 2.0.16 node, actually, it's less than 50%.
>
> And in the perf results, the first line could go as high as 50%, as we
> turn up the read traffic, which never appeared in 2.0.16.
>
> Any thoughts?
> Thanks
>
>
> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
> Overhead  Shared Object                          Symbol
>    6.52%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    4.84%  libzip.so                              [.] adler32
>    2.88%  perf-196410.map                        [.]
> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/
> marshal/IntegerType;.compare
>    2.39%  perf-196410.map                        [.]
> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/
> marshal/IntegerType;.findMostSignificantByte
>    2.03%  perf-196410.map                        [.]
> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/
> DecoratedKey;.compareTo
>    1.65%  perf-196410.map                        [.] vtable chunks
>    1.44%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in Ljava/util/concurrent/
> ConcurrentSkipListMap;.findNode
>    1.02%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    1.00%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-
> libsnappyjava.so/>        [.] 0x0000000000003804
>    0.87%  perf-196410.map                        [.]
> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/
> AbstractCell$1;.computeNext
>    0.82%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-
> libsnappyjava.so/>        [.] 0x00000000000036dc
>    0.79%  [kernel]                               [k]
> copy_user_generic_string
>    0.73%  perf-196410.map                        [.] vtable chunks
>    0.71%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable in
> Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>    0.70%  [kernel]                               [k] find_busiest_group
>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>    0.68%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>    0.65%  perf-196410.map                        [.]
> jbyte_disjoint_arraycopy
>    0.64%  [kernel]                               [k] _raw_spin_lock
>    0.63%  [kernel]                               [k] __schedule
>    0.45%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-
> libsnappyjava.so/>        [.] 0x00000000000036df
>
> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <corry@bestdata.be
> <ma...@bestdata.be>> wrote:
> @JC, Get the pid of your target java process (something like "ps -ef |
> grep -i cassandra") .
> Then do a kill -3 <pid> (at unix/linux)
> Check the stdout logfile of the process.
>  it should contain the threaddump.
> If you found it, then great!
> Let that kill -3 loop for about 2 or 3 minutes.
> Herafter copy paste and load the stdout file into one if the mentioned
> tools.
> If you are not familiar with the java internals, then those threaddumps
> will learn you a lot:)
>
>
>
>
> Op vrijdag 29 januari 2016 heeft Jean Carlo <jean.jeancarl48@gmail.com<
> mailto:jean.jeancarl48@gmail.com>> het volgende geschreven:
> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
> am not good on jvm so I would like to know how to do what @CorryOpdenakker
> propose with cassandra.
>
> :)
>
> I check concurrent_compactors
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
> wrote:
> Hi guys,
> Cassandra is still new for me, but I have a lot of java tuning experience.
>
> For root cause detection of performance degradations its always good to
> start with collecting a series of java thread dumps. Take at problem
> occurrence using a loopscript for example 60 thread dumps with an interval
> of 1 or 2 seconds.
> Then load those dumps into IBM thread dump analyzer or in "eclipse mat" or
> any similar tool and see which methods appear to be most active or blocking
> others.
>
> Its really very useful
>
> Same can be be done in a normal situation to compare the difference.
>
> That should give more insights.
>
> Cheers, Corry
>
>
> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het
> volgende geschreven:
> Hello,
> We have another update on performance on 2.1.11. compression_chunk_size
> didn’t really help much but We changed concurrent_compactors from default
> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
> metric that could affect read latencies is that 2.1.11 is running
> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
> 2.0.9 even though our read_repair_chance is same on both. Could anyone shed
> some light on why 2.1.11 could be running read repair 10 to 50 times more
> in spite of same configuration on both clusters?
>
> dclocal_read_repair_chance=0.100000 AND
> read_repair_chance=0.000000 AND
>
> Here is the table for read repair metrics for both clusters.
>                 2.0.9   2.1.11
> ReadRepairedBackground  5MinAvg 0.006   0.1
>         15MinAvg        0.009   0.153
> ReadRepairedBlocking    5MinAvg 0.002   0.55
>         15MinAvg        0.007   0.91
>
> Thanks
> Praveen
>
> From: Jeff Jirsa <je...@crowdstrike.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Thursday, January 14, 2016 at 2:58 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Sorry I wasn’t as explicit as I should have been
>
> The same buffer size is used by compressed reads as well, but tuned with
> compression_chunk_size table property. It’s likely true that if you lower
> compression_chunk_size, you’ll see improved read performance.
>
> This was covered in the AWS re:Invent youtube link I sent in my original
> reply.
>
>
>
> From: "Peddi, Praveen"
> Reply-To: "user@cassandra.apache.org"
> Date: Thursday, January 14, 2016 at 11:36 AM
> To: "user@cassandra.apache.org", Zhiyan Shao
> Cc: "Agrawal, Pratik"
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Hi,
> We will try with reduced “rar_buffer_size” to 4KB. However CASSANDRA-10249<
> https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this only
> affects users who have 1. disabled compression, 2. switched to buffered i/o
> from mmap’d”. None of this is true for us I believe. We use default
> disk_access_mode which should be mmap. We also used LZ4Compressor when
> created table.
>
> We will let you know if this property had any effect. We were testing with
> 2.1.11 and this was only fixed in 2.1.12 so we need to play with latest
> version.
>
> Praveen
>
>
>
>
>
> From: Jeff Jirsa <je...@crowdstrike.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Thursday, January 14, 2016 at 1:29 PM
> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Cc: "Agrawal, Pratik" <pa...@amazon.com>
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
> / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not
> this is really the case depends on how much of your data is in page cache,
> and whether or not you’re using mmap. Since the original question was asked
> by someone using small RAM instances, it’s possible.
>
> We mitigate this by dropping compression_chunk_size in order to force a
> smaller buffer on reads, so we don’t over read very small blocks. This has
> other side effects (lower compression ratio, more garbage during
> streaming), but significantly speeds up read workloads for us.
>
>
> From: Zhiyan Shao
> Date: Thursday, January 14, 2016 at 9:49 AM
> To: "user@cassandra.apache.org"
> Cc: Jeff Jirsa, "Agrawal, Pratik"
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
> forum, you can find another thread I sent a while ago. The perf test I did
> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
> 2.0.14.
>
> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com> wrote:
> Thanks Jeff for your reply. Sorry for delayed response. We were running
> some more tests and wanted to wait for the results.
>
> So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9
> (see below) for the same exact load test. Memory spikes were also
> aggressive on 2.1.11.
>
> So we wanted to rule out any of our custom setting so we ended up doing
> some testing with Cassandra stress test and default Cassandra installation.
> Here are the results we saw between 2.0.9 and 2.1.11. Both are default
> installations and both use Cassandra stress test with same params. This is
> the closest apple-apple comparison we can get. As you can see both read and
> write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are
> using default installation.
>
> Highlights of the test:
> Load: 2x reads and 1x writes
> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
> One observation is, As the number of threads are increased, 2.1.11 read
> latencies are getting worse compared to 2.0.9 (see below table for 24
> threads vs 54 threads)
> Not sure if anyone has done this kind of comparison before and what their
> thoughts are. I am thinking for this same reason
>
> 2.0.9 Plain      type         total ops     op/s            pk/s
>  row/s            mean             med        0.95    0.99    0.999
> max           time
>  16 threadCount  READ   66854   7205    7205    7205    1.6     1.3
>  2.8     3.5     9.6     85.3    9.3
>  16 threadCount  WRITE  33146   3572    3572    3572    1.3     1
>  2.6     3.3     7       206.5   9.3
>  16 threadCount  total  100000  10777   10777   10777   1.5     1.3
>  2.7     3.4     7.9     206.5   9.3
> 2.1.11 Plain
>  16 threadCount  READ   67096   6818    6818    6818    1.6     1.5
>  2.6     3.5     7.9     61.7    9.8
>  16 threadCount  WRITE  32904   3344    3344    3344    1.4     1.3
>  2.3     3       6.5     56.7    9.8
>  16 threadCount  total  100000  10162   10162   10162   1.6     1.4
>  2.5     3.2     6       61.7    9.8
> 2.0.9 Plain
>  24 threadCount  READ   66414   8167    8167    8167    2       1.6
>  3.7     7.5     16.7    208     8.1
>  24 threadCount  WRITE  33586   4130    4130    4130    1.7     1.3
>  3.4     5.4     25.6    45.4    8.1
>  24 threadCount  total  100000  12297   12297   12297   1.9     1.5
>  3.5     6.2     15.2    208     8.1
> 2.1.11 Plain
>  24 threadCount  READ   66628   7433    7433    7433    2.2     2.1
>  3.4     4.3     8.4     38.3    9
>  24 threadCount  WRITE  33372   3723    3723    3723    2       1.9
>  3.1     3.8     21.9    37.2    9
>  24 threadCount  total  100000  11155   11155   11155   2.1     2
>  3.3     4.1     8.8     38.3    9
> 2.0.9 Plain
>  54 threadCount  READ   67115   13419   13419   13419   2.8     2.6
>  4.2     6.4     36.9    82.4    5
>  54 threadCount  WRITE  32885   6575    6575    6575    2.5     2.3
>  3.9     5.6     15.9    81.5    5
>  54 threadCount  total  100000  19993   19993   19993   2.7     2.5
>  4.1     5.7     13.9    82.4    5
> 2.1.11 Plain
>  54 threadCount  READ   66780   8951    8951    8951    4.3     3.9
>  6.8     9.7     49.4    69.9    7.5
>  54 threadCount  WRITE  33220   4453    4453    4453    3.5     3.2
>  5.7     8.2     36.8    68      7.5
>  54 threadCount  total  100000  13404   13404   13404   4       3.7
>  6.6     9.2     48      69.9    7.5
>
>
> From: Jeff Jirsa <je...@crowdstrike.com>
> Date: Thursday, January 7, 2016 at 1:01 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
> Praveen <pe...@amazon.com>
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
> experience was anywhere from 20-60%, depending on workload).
>
> However, it’s not necessarily true that everything behaves exactly the
> same – in particular, memtables are different, commitlog segment handling
> is different, and GC params may need to be tuned differently for 2.1 than
> 2.0.
>
> When the system is busy, what’s it actually DOING? Cassandra exposes a TON
> of metrics – have you plugged any into a reporting system to see what’s
> going on? Is your latency due to pegged cpu, iowait/disk queues or gc
> pauses?
>
> My colleagues spent a lot of time validating different AWS EBS configs
> (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), 2.1
> was faster in almost every case, but you’re using an instance size I don’t
> believe we tried (too little RAM to be viable in production).  c3.2xl only
> gives you 15G of ram – most “performance” based systems want 2-4x that
> (people running G1 heaps usually start at 16G heaps and leave another
> 16-30G for page cache), you’re running fairly small hardware – it’s
> possible that 2.1 isn’t “as good” on smaller hardware.
>
> (I do see your domain, presumably you know all of this, but just to be
> sure):
>
> You’re using c3, so presumably you’re using EBS – are you using GP2? Which
> volume sizes? Are they the same between versions? Are you hitting your iops
> limits? Running out of burst tokens? Do you have enhanced networking
> enabled? At load, what part of your system is stressed? Are you cpu bound?
> Are you seeing GC pauses hurt latency? Have you tried changing
> memtable_allocation_type -> offheap objects  (available in 2.1, not in 2.0)?
>
> Tuning gc_grace is weird – do you understand what it does? Are you
> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
> you doing a lot of compaction?
>
>
> From: "Peddi, Praveen"
> Reply-To: "user@cassandra.apache.org"
> Date: Wednesday, January 6, 2016 at 11:41 AM
> To: "user@cassandra.apache.org"
> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Hi,
> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
> environment with pretty much same yaml settings in both (removed unused
> yaml settings and renamed few others) and we have noticed performance on
> 2.1.11 is worse compared to 2.0.9. After more investigation we found that
> the performance gets worse as we increase replication factor on 2.1.11
> where as on 2.0.9 performance is more or less same. Has anything
> architecturally changed as far as replication is concerned in 2.1.11?
>
> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are
> obviously doing something different. However the client code, load test is
> all identical in both cases.
>
> Details:
> Nodes: 3 ec2 c3.2x large
> R/W Consistency: QUORUM
> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
> removed unused properties from yaml file.
> We run compaction aggressive compaction with low gc_grace (15 mins) but
> this is true for both 2.0.9 and 2.1.11.
>
> As you can see, all p50, p90 and p99 latencies stayed with in 10%
> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
> latencies almost doubled (especially reads are much slower than writes).
>
> # Nodes         RF      # of rows       2.0.9   2.1.11
> READ
>                         P50     P90     P99     P50     P90     P99
> 3       1       450     306     594     747     425     849     1085
> 3       3       450     358     634     877     708     1274    2642
>
> WRITE
> 3       1       10      26      80      179     37      131     196
> 3       3       10      31      96      184     46      166     468
>
> Any pointers on how to debug performance issues will be appreciated.
>
> Praveen
>
>
>
> --
> ----------------------------------
> Bestdata.be<http://Bestdata.be>
> Optimised ict
> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
> Corry@BestData.be<ma...@BestData.be>
> ----------------------------------
>
>
>
>
> --
> ----------------------------------
> Bestdata.be<http://Bestdata.be>
> Optimised ict
> Tel:+32(0)496609576<tel:%2B32%280%29496609576>
> Corry@BestData.be<ma...@BestData.be>
> ----------------------------------
>
>
>
>
> --
> Dikang
>
>
>
>
>
> --
> Dikang
>
>
>


-- 
Dikang

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by Michael Kjellman <mk...@internalcircle.com>.
Yes, We hit this as well. We have a internal patch that I wrote to mostly revert the behavior back to ByteBuffers with as small amount of code change as possible. Performance of our build is now even with 2.0.x and we've also forward ported it to 3.x (although the 3.x patch was even more complicated due to Bounds, RangeTombstoneBound, ClusteringPrefix which actually increases the number of allocations to somewhere between 11 and 13 depending on how I count it per indexed block -- making it even worse than what you're observing in 2.1.

We haven't upstreamed it as 2.1 is obviously not taking any changes at this point and the longer term solution is https://issues.apache.org/jira/browse/CASSANDRA-9754 (which also includes the changes to go back to ByteBuffers and remove as much of the Composites from the storage engine as possible.) Also, the solution is a bit of a hack -- although it was a blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if it works..

best,
kjellman


On Nov 8, 2016, at 11:31 AM, Dikang Gu <di...@gmail.com>> wrote:

This is very expensive:

"MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10 tid=0x00007f2fd57e1800 nid=0x1cc510 runnable [0x00007f2b971b0000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.cassandra.db.marshal.IntegerType.compare(IntegerType.java:29)
        at org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:98)
        at org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:31)
        at java.util.TreeMap.put(TreeMap.java:545)
        at java.util.TreeSet.add(TreeSet.java:255)
        at org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:254)
        at org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:228)
        at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:104)
        at org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:156)
        at org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)


Checked the git history, it comes from this jira: https://issues.apache.org/jira/browse/CASSANDRA-5417

Any thoughts?
​

On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta <pa...@gmail.com>> wrote:
Haven't seen this before, but perhaps it's related to CASSANDRA-10433? This is just a wild guess as it's in a related codepath, but maybe worth trying out the patch available to see if it helps anything...

2016-10-28 15:03 GMT-02:00 Dikang Gu <di...@gmail.com>>:
We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster to 2.1.14 as well. The 2.1.14 node is not able to handle the same amount of read traffic as the 2.0.16 node, actually, it's less than 50%.

And in the perf results, the first line could go as high as 50%, as we turn up the read traffic, which never appeared in 2.0.16.

Any thoughts?
Thanks


Samples: 952K of event 'cycles', Event count (approx.): 229681774560
Overhead  Shared Object                          Symbol
   6.52%  perf-196410.map                        [.] Lorg/apache/cassandra/db/marshal/IntegerType;.compare in Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   4.84%  libzip.so                              [.] adler32
   2.88%  perf-196410.map                        [.] Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/marshal/IntegerType;.compare
   2.39%  perf-196410.map                        [.] Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/marshal/IntegerType;.findMostSignificantByte
   2.03%  perf-196410.map                        [.] Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/DecoratedKey;.compareTo
   1.65%  perf-196410.map                        [.] vtable chunks
   1.44%  perf-196410.map                        [.] Lorg/apache/cassandra/db/DecoratedKey;.compareTo in Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
   1.02%  perf-196410.map                        [.] Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
   1.00%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsnappyjava.so/>        [.] 0x0000000000003804
   0.87%  perf-196410.map                        [.] Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
   0.82%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsnappyjava.so/>        [.] 0x00000000000036dc
   0.79%  [kernel]                               [k] copy_user_generic_string
   0.73%  perf-196410.map                        [.] vtable chunks
   0.71%  perf-196410.map                        [.] Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable in Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
   0.70%  [kernel]                               [k] find_busiest_group
   0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
   0.68%  perf-196410.map                        [.] Lorg/apache/cassandra/db/DecoratedKey;.compareTo
   0.65%  perf-196410.map                        [.] jbyte_disjoint_arraycopy
   0.64%  [kernel]                               [k] _raw_spin_lock
   0.63%  [kernel]                               [k] __schedule
   0.45%  snappy-1.0.5.2-libsnappyjava.so<http://snappy-1.0.5.2-libsnappyjava.so/>        [.] 0x00000000000036df

On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <co...@bestdata.be>> wrote:
@JC, Get the pid of your target java process (something like "ps -ef | grep -i cassandra") .
Then do a kill -3 <pid> (at unix/linux)
Check the stdout logfile of the process.
 it should contain the threaddump.
If you found it, then great!
Let that kill -3 loop for about 2 or 3 minutes.
Herafter copy paste and load the stdout file into one if the mentioned tools.
If you are not familiar with the java internals, then those threaddumps will learn you a lot:)




Op vrijdag 29 januari 2016 heeft Jean Carlo <je...@gmail.com>> het volgende geschreven:
I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I am not good on jvm so I would like to know how to do what @CorryOpdenakker  propose with cassandra.

:)

I check concurrent_compactors


Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be> wrote:
Hi guys,
Cassandra is still new for me, but I have a lot of java tuning experience.

For root cause detection of performance degradations its always good to start with collecting a series of java thread dumps. Take at problem occurrence using a loopscript for example 60 thread dumps with an interval of 1 or 2 seconds.
Then load those dumps into IBM thread dump analyzer or in "eclipse mat" or any similar tool and see which methods appear to be most active or blocking others.

Its really very useful

Same can be be done in a normal situation to compare the difference.

That should give more insights.

Cheers, Corry


Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het volgende geschreven:
Hello,
We have another update on performance on 2.1.11. compression_chunk_size  didn’t really help much but We changed concurrent_compactors from default to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11 read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX metric that could affect read latencies is that 2.1.11 is running ReadRepairedBackground and ReadRepairedBlocking too frequently compared to 2.0.9 even though our read_repair_chance is same on both. Could anyone shed some light on why 2.1.11 could be running read repair 10 to 50 times more in spite of same configuration on both clusters?

dclocal_read_repair_chance=0.100000 AND
read_repair_chance=0.000000 AND

Here is the table for read repair metrics for both clusters.
                2.0.9   2.1.11
ReadRepairedBackground  5MinAvg 0.006   0.1
        15MinAvg        0.009   0.153
ReadRepairedBlocking    5MinAvg 0.002   0.55
        15MinAvg        0.007   0.91

Thanks
Praveen

From: Jeff Jirsa <je...@crowdstrike.com>
Reply-To: <us...@cassandra.apache.org>
Date: Thursday, January 14, 2016 at 2:58 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Sorry I wasn’t as explicit as I should have been

The same buffer size is used by compressed reads as well, but tuned with compression_chunk_size table property. It’s likely true that if you lower compression_chunk_size, you’ll see improved read performance.

This was covered in the AWS re:Invent youtube link I sent in my original reply.



From: "Peddi, Praveen"
Reply-To: "user@cassandra.apache.org"
Date: Thursday, January 14, 2016 at 11:36 AM
To: "user@cassandra.apache.org", Zhiyan Shao
Cc: "Agrawal, Pratik"
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Hi,
We will try with reduced “rar_buffer_size” to 4KB. However CASSANDRA-10249<https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this only affects users who have 1. disabled compression, 2. switched to buffered i/o from mmap’d”. None of this is true for us I believe. We use default disk_access_mode which should be mmap. We also used LZ4Compressor when created table.

We will let you know if this property had any effect. We were testing with 2.1.11 and this was only fixed in 2.1.12 so we need to play with latest version.

Praveen





From: Jeff Jirsa <je...@crowdstrike.com>
Reply-To: <us...@cassandra.apache.org>
Date: Thursday, January 14, 2016 at 1:29 PM
To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org" <us...@cassandra.apache.org>
Cc: "Agrawal, Pratik" <pa...@amazon.com>
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not this is really the case depends on how much of your data is in page cache, and whether or not you’re using mmap. Since the original question was asked by someone using small RAM instances, it’s possible.

We mitigate this by dropping compression_chunk_size in order to force a smaller buffer on reads, so we don’t over read very small blocks. This has other side effects (lower compression ratio, more garbage during streaming), but significantly speeds up read workloads for us.


From: Zhiyan Shao
Date: Thursday, January 14, 2016 at 9:49 AM
To: "user@cassandra.apache.org"
Cc: Jeff Jirsa, "Agrawal, Pratik"
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum, you can find another thread I sent a while ago. The perf test I did indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with 2.0.14.

On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com> wrote:
Thanks Jeff for your reply. Sorry for delayed response. We were running some more tests and wanted to wait for the results.

So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 (see below) for the same exact load test. Memory spikes were also aggressive on 2.1.11.

So we wanted to rule out any of our custom setting so we ended up doing some testing with Cassandra stress test and default Cassandra installation. Here are the results we saw between 2.0.9 and 2.1.11. Both are default installations and both use Cassandra stress test with same params. This is the closest apple-apple comparison we can get. As you can see both read and write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are using default installation.

Highlights of the test:
Load: 2x reads and 1x writes
CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
One observation is, As the number of threads are increased, 2.1.11 read latencies are getting worse compared to 2.0.9 (see below table for 24 threads vs 54 threads)
Not sure if anyone has done this kind of comparison before and what their thoughts are. I am thinking for this same reason

2.0.9 Plain      type         total ops     op/s            pk/s           row/s            mean             med        0.95    0.99    0.999        max           time
 16 threadCount  READ   66854   7205    7205    7205    1.6     1.3     2.8     3.5     9.6     85.3    9.3
 16 threadCount  WRITE  33146   3572    3572    3572    1.3     1       2.6     3.3     7       206.5   9.3
 16 threadCount  total  100000  10777   10777   10777   1.5     1.3     2.7     3.4     7.9     206.5   9.3
2.1.11 Plain
 16 threadCount  READ   67096   6818    6818    6818    1.6     1.5     2.6     3.5     7.9     61.7    9.8
 16 threadCount  WRITE  32904   3344    3344    3344    1.4     1.3     2.3     3       6.5     56.7    9.8
 16 threadCount  total  100000  10162   10162   10162   1.6     1.4     2.5     3.2     6       61.7    9.8
2.0.9 Plain
 24 threadCount  READ   66414   8167    8167    8167    2       1.6     3.7     7.5     16.7    208     8.1
 24 threadCount  WRITE  33586   4130    4130    4130    1.7     1.3     3.4     5.4     25.6    45.4    8.1
 24 threadCount  total  100000  12297   12297   12297   1.9     1.5     3.5     6.2     15.2    208     8.1
2.1.11 Plain
 24 threadCount  READ   66628   7433    7433    7433    2.2     2.1     3.4     4.3     8.4     38.3    9
 24 threadCount  WRITE  33372   3723    3723    3723    2       1.9     3.1     3.8     21.9    37.2    9
 24 threadCount  total  100000  11155   11155   11155   2.1     2       3.3     4.1     8.8     38.3    9
2.0.9 Plain
 54 threadCount  READ   67115   13419   13419   13419   2.8     2.6     4.2     6.4     36.9    82.4    5
 54 threadCount  WRITE  32885   6575    6575    6575    2.5     2.3     3.9     5.6     15.9    81.5    5
 54 threadCount  total  100000  19993   19993   19993   2.7     2.5     4.1     5.7     13.9    82.4    5
2.1.11 Plain
 54 threadCount  READ   66780   8951    8951    8951    4.3     3.9     6.8     9.7     49.4    69.9    7.5
 54 threadCount  WRITE  33220   4453    4453    4453    3.5     3.2     5.7     8.2     36.8    68      7.5
 54 threadCount  total  100000  13404   13404   13404   4       3.7     6.6     9.2     48      69.9    7.5


From: Jeff Jirsa <je...@crowdstrike.com>
Date: Thursday, January 7, 2016 at 1:01 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi Praveen <pe...@amazon.com>
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our experience was anywhere from 20-60%, depending on workload).

However, it’s not necessarily true that everything behaves exactly the same – in particular, memtables are different, commitlog segment handling is different, and GC params may need to be tuned differently for 2.1 than 2.0.

When the system is busy, what’s it actually DOING? Cassandra exposes a TON of metrics – have you plugged any into a reporting system to see what’s going on? Is your latency due to pegged cpu, iowait/disk queues or gc pauses?

My colleagues spent a lot of time validating different AWS EBS configs (video from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), 2.1 was faster in almost every case, but you’re using an instance size I don’t believe we tried (too little RAM to be viable in production).  c3.2xl only gives you 15G of ram – most “performance” based systems want 2-4x that (people running G1 heaps usually start at 16G heaps and leave another 16-30G for page cache), you’re running fairly small hardware – it’s possible that 2.1 isn’t “as good” on smaller hardware.

(I do see your domain, presumably you know all of this, but just to be sure):

You’re using c3, so presumably you’re using EBS – are you using GP2? Which volume sizes? Are they the same between versions? Are you hitting your iops limits? Running out of burst tokens? Do you have enhanced networking enabled? At load, what part of your system is stressed? Are you cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing memtable_allocation_type -> offheap objects  (available in 2.1, not in 2.0)?

Tuning gc_grace is weird – do you understand what it does? Are you overwriting or deleting a lot of data in your test (that’d be unusual)? Are you doing a lot of compaction?


From: "Peddi, Praveen"
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, January 6, 2016 at 11:41 AM
To: "user@cassandra.apache.org"
Subject: Slow performance after upgrading from 2.0.9 to 2.1.11

Hi,
We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest environment with pretty much same yaml settings in both (removed unused yaml settings and renamed few others) and we have noticed performance on 2.1.11 is worse compared to 2.0.9. After more investigation we found that the performance gets worse as we increase replication factor on 2.1.11 where as on 2.0.9 performance is more or less same. Has anything architecturally changed as far as replication is concerned in 2.1.11?

All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are obviously doing something different. However the client code, load test is all identical in both cases.

Details:
Nodes: 3 ec2 c3.2x large
R/W Consistency: QUORUM
Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and removed unused properties from yaml file.
We run compaction aggressive compaction with low gc_grace (15 mins) but this is true for both 2.0.9 and 2.1.11.

As you can see, all p50, p90 and p99 latencies stayed with in 10% difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11 latencies almost doubled (especially reads are much slower than writes).

# Nodes         RF      # of rows       2.0.9   2.1.11
READ
                        P50     P90     P99     P50     P90     P99
3       1       450     306     594     747     425     849     1085
3       3       450     358     634     877     708     1274    2642

WRITE
3       1       10      26      80      179     37      131     196
3       3       10      31      96      184     46      166     468

Any pointers on how to debug performance issues will be appreciated.

Praveen



--
----------------------------------
Bestdata.be<http://Bestdata.be>
Optimised ict
Tel:+32(0)496609576<tel:%2B32%280%29496609576>
Corry@BestData.be<ma...@BestData.be>
----------------------------------




--
----------------------------------
Bestdata.be<http://Bestdata.be>
Optimised ict
Tel:+32(0)496609576<tel:%2B32%280%29496609576>
Corry@BestData.be<ma...@BestData.be>
----------------------------------




--
Dikang





--
Dikang



Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by Dikang Gu <di...@gmail.com>.
This is very expensive:

"MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10
tid=0x00007f2fd57e1800 nid=0x1cc510 runnable [0x00007f2b971b0000]
   java.lang.Thread.State: RUNNABLE
        at
org.apache.cassandra.db.marshal.IntegerType.compare(IntegerType.java:29)
        at
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:98)
        at
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:31)
        at java.util.TreeMap.put(TreeMap.java:545)
        at java.util.TreeSet.add(TreeSet.java:255)
        at
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:254)
        at
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:228)
        at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:104)
        at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:156)
        at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)


Checked the git history, it comes from this jira:
https://issues.apache.org/jira/browse/CASSANDRA-5417

Any thoughts?
​

On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta <pa...@gmail.com>
wrote:

> Haven't seen this before, but perhaps it's related to CASSANDRA-10433?
> This is just a wild guess as it's in a related codepath, but maybe worth
> trying out the patch available to see if it helps anything...
>
> 2016-10-28 15:03 GMT-02:00 Dikang Gu <di...@gmail.com>:
>
>> We are seeing huge cpu regression when upgrading one of our 2.0.16
>> cluster to 2.1.14 as well. The 2.1.14 node is not able to handle the same
>> amount of read traffic as the 2.0.16 node, actually, it's less than 50%.
>>
>> And in the perf results, the first line could go as high as 50%, as we
>> turn up the read traffic, which never appeared in 2.0.16.
>>
>> Any thoughts?
>> Thanks
>>
>>
>> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
>> Overhead  Shared Object                          Symbol
>>    6.52%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    4.84%  libzip.so                              [.] adler32
>>    2.88%  perf-196410.map                        [.]
>> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.compare
>>    2.39%  perf-196410.map                        [.]
>> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.findMostSignificantByte
>>    2.03%  perf-196410.map                        [.]
>> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/Decor
>> atedKey;.compareTo
>>    1.65%  perf-196410.map                        [.] vtable chunks
>>    1.44%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in
>> Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
>>    1.02%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    1.00%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x0000000000003804
>>    0.87%  perf-196410.map                        [.]
>> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/Abstr
>> actCell$1;.computeNext
>>    0.82%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036dc
>>    0.79%  [kernel]                               [k]
>> copy_user_generic_string
>>    0.73%  perf-196410.map                        [.] vtable chunks
>>    0.71%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable
>> in Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>>    0.70%  [kernel]                               [k] find_busiest_group
>>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>>    0.68%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>>    0.65%  perf-196410.map                        [.]
>> jbyte_disjoint_arraycopy
>>    0.64%  [kernel]                               [k] _raw_spin_lock
>>    0.63%  [kernel]                               [k] __schedule
>>    0.45%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036df
>>
>> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <co...@bestdata.be>
>> wrote:
>>
>>> @JC, Get the pid of your target java process (something like "ps -ef |
>>> grep -i cassandra") .
>>> Then do a kill -3 <pid> (at unix/linux)
>>> Check the stdout logfile of the process.
>>>  it should contain the threaddump.
>>> If you found it, then great!
>>> Let that kill -3 loop for about 2 or 3 minutes.
>>> Herafter copy paste and load the stdout file into one if the mentioned
>>> tools.
>>> If you are not familiar with the java internals, then those threaddumps
>>> will learn you a lot:)
>>>
>>>
>>>
>>>
>>> Op vrijdag 29 januari 2016 heeft Jean Carlo <je...@gmail.com>
>>> het volgende geschreven:
>>>
>>>> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10.
>>>> I am not good on jvm so I would like to know how to do what @
>>>> CorryOpdenakker  propose with cassandra.
>>>>
>>>> :)
>>>>
>>>> I check concurrent_compactors
>>>>
>>>>
>>>> Saludos
>>>>
>>>> Jean Carlo
>>>>
>>>> "The best way to predict the future is to invent it" Alan Kay
>>>>
>>>> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>> Cassandra is still new for me, but I have a lot of java tuning
>>>>> experience.
>>>>>
>>>>> For root cause detection of performance degradations its always good
>>>>> to start with collecting a series of java thread dumps. Take at problem
>>>>> occurrence using a loopscript for example 60 thread dumps with an interval
>>>>> of 1 or 2 seconds.
>>>>> Then load those dumps into IBM thread dump analyzer or in "eclipse
>>>>> mat" or any similar tool and see which methods appear to be most active or
>>>>> blocking others.
>>>>>
>>>>> Its really very useful
>>>>>
>>>>> Same can be be done in a normal situation to compare the difference.
>>>>>
>>>>> That should give more insights.
>>>>>
>>>>> Cheers, Corry
>>>>>
>>>>>
>>>>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com>
>>>>> het volgende geschreven:
>>>>>
>>>>>> Hello,
>>>>>> We have another update on performance on 2.1.11.
>>>>>> compression_chunk_size  didn’t really help much but We changed
>>>>>> concurrent_compactors from default to 64 in 2.1.11 and read latencies
>>>>>> improved significantly. However, 2.1.11 read latencies are still 1.5 slower
>>>>>> than 2.0.9. One thing we noticed in JMX metric that could affect read
>>>>>> latencies is that 2.1.11 is running ReadRepairedBackground and
>>>>>> ReadRepairedBlocking too frequently compared to 2.0.9 even though our read_
>>>>>> repair_chance is same on both. Could anyone shed some light on why
>>>>>> 2.1.11 could be running read repair 10 to 50 times more in spite of same
>>>>>> configuration on both clusters?
>>>>>>
>>>>>> dclocal_read_repair_chance=0.100000 AND
>>>>>> read_repair_chance=0.000000 AND
>>>>>>
>>>>>> Here is the table for read repair metrics for both clusters.
>>>>>> 2.0.9 2.1.11
>>>>>> ReadRepairedBackground 5MinAvg 0.006 0.1
>>>>>> 15MinAvg 0.009 0.153
>>>>>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>>>>>> 15MinAvg 0.007 0.91
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>>> Reply-To: <us...@cassandra.apache.org>
>>>>>> Date: Thursday, January 14, 2016 at 2:58 PM
>>>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Sorry I wasn’t as explicit as I should have been
>>>>>>
>>>>>> The same buffer size is used by compressed reads as well, but tuned
>>>>>> with compression_chunk_size table property. It’s likely true that if you
>>>>>> lower compression_chunk_size, you’ll see improved read performance.
>>>>>>
>>>>>> This was covered in the AWS re:Invent youtube link I sent in my
>>>>>> original reply.
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: "Peddi, Praveen"
>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>> Date: Thursday, January 14, 2016 at 11:36 AM
>>>>>> To: "user@cassandra.apache.org", Zhiyan Shao
>>>>>> Cc: "Agrawal, Pratik"
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Hi,
>>>>>> We will try with reduced “rar_buffer_size” to 4KB. However
>>>>>> CASSANDRA-10249
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this
>>>>>> only affects users who have 1. disabled compression, 2. switched to
>>>>>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>>>>>> default disk_access_mode which should be mmap. We also used
>>>>>> LZ4Compressor when created table.
>>>>>>
>>>>>> We will let you know if this property had any effect. We were testing
>>>>>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>>>>>> latest version.
>>>>>>
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>>> Reply-To: <us...@cassandra.apache.org>
>>>>>> Date: Thursday, January 14, 2016 at 1:29 PM
>>>>>> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org"
>>>>>> <us...@cassandra.apache.org>
>>>>>> Cc: "Agrawal, Pratik" <pa...@amazon.com>
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> This may be due to https://issues.apache.org/j
>>>>>> ira/browse/CASSANDRA-10249 / https://issues.apache.org/jira/
>>>>>> browse/CASSANDRA-8894 - whether or not this is really the case
>>>>>> depends on how much of your data is in page cache, and whether or not
>>>>>> you’re using mmap. Since the original question was asked by someone using
>>>>>> small RAM instances, it’s possible.
>>>>>>
>>>>>> We mitigate this by dropping compression_chunk_size in order to force
>>>>>> a smaller buffer on reads, so we don’t over read very small blocks. This
>>>>>> has other side effects (lower compression ratio, more garbage during
>>>>>> streaming), but significantly speeds up read workloads for us.
>>>>>>
>>>>>>
>>>>>> From: Zhiyan Shao
>>>>>> Date: Thursday, January 14, 2016 at 9:49 AM
>>>>>> To: "user@cassandra.apache.org"
>>>>>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>>>>>> forum, you can find another thread I sent a while ago. The perf test I did
>>>>>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>>>>>> 2.0.14.
>>>>>>
>>>>>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Jeff for your reply. Sorry for delayed response. We were
>>>>>>> running some more tests and wanted to wait for the results.
>>>>>>>
>>>>>>> So basically we saw higher CPU with 2.1.11 was higher compared to
>>>>>>> 2.0.9 (see below) for the same exact load test. Memory spikes were also
>>>>>>> aggressive on 2.1.11.
>>>>>>>
>>>>>>> So we wanted to rule out any of our custom setting so we ended up
>>>>>>> doing some testing with Cassandra stress test and default Cassandra
>>>>>>> installation. Here are the results we saw between 2.0.9 and 2.1.11. Both
>>>>>>> are default installations and both use Cassandra stress test with same
>>>>>>> params. This is the closest apple-apple comparison we can get. As you can
>>>>>>> see both read and write latencies are 30 to 50% worse in 2.1.11 than 2.0.9.
>>>>>>> Since we are using default installation.
>>>>>>>
>>>>>>> *Highlights of the test:*
>>>>>>> Load: 2x reads and 1x writes
>>>>>>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>>>>>>>
>>>>>>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>>>>>>>
>>>>>>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>>>>>>>
>>>>>>> *One observation is, As the number of threads are increased, 2.1.11
>>>>>>> read latencies are getting worse compared to 2.0.9 (see below table for 24
>>>>>>> threads vs 54 threads)*
>>>>>>> Not sure if anyone has done this kind of comparison before and what
>>>>>>> their thoughts are. I am thinking for this same reason
>>>>>>>
>>>>>>> 2.0.9 Plain  type       total ops     op/s     pk/s    row/s
>>>>>>> mean      med 0.95 0.99 0.999      max    time
>>>>>>>  16 threadCount  READ 66854 7205 7205 7205 1.6 1.3 2.8 3.5 9.6 85.3
>>>>>>> 9.3
>>>>>>>  16 threadCount  WRITE 33146 3572 3572 3572 1.3 1 2.6 3.3 7 206.5
>>>>>>> 9.3
>>>>>>>  16 threadCount  total 100000 10777 10777 10777 1.5 1.3 2.7 3.4 7.9
>>>>>>> 206.5 9.3
>>>>>>> 2.1.11 Plain
>>>>>>>  16 threadCount  READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7
>>>>>>> 9.8
>>>>>>>  16 threadCount  WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7
>>>>>>> 9.8
>>>>>>>  16 threadCount  total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6
>>>>>>> 61.7 9.8
>>>>>>> 2.0.9 Plain
>>>>>>>  24 threadCount  READ 66414 8167 8167 8167 2 1.6 3.7 7.5 16.7 208
>>>>>>> 8.1
>>>>>>>  24 threadCount  WRITE 33586 4130 4130 4130 1.7 1.3 3.4 5.4 25.6
>>>>>>> 45.4 8.1
>>>>>>>  24 threadCount  total 100000 12297 12297 12297 1.9 1.5 3.5 6.2 15.2
>>>>>>> 208 8.1
>>>>>>> 2.1.11 Plain
>>>>>>>  24 threadCount  READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3
>>>>>>> 9
>>>>>>>  24 threadCount  WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2
>>>>>>> 9
>>>>>>>  24 threadCount  total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8
>>>>>>> 38.3 9
>>>>>>> 2.0.9 Plain
>>>>>>>  54 threadCount  READ 67115 13419 13419 13419 2.8 2.6 4.2 6.4 36.9
>>>>>>> 82.4 5
>>>>>>>  54 threadCount  WRITE 32885 6575 6575 6575 2.5 2.3 3.9 5.6 15.9
>>>>>>> 81.5 5
>>>>>>>  54 threadCount  total 100000 19993 19993 19993 2.7 2.5 4.1 5.7 13.9
>>>>>>> 82.4 5
>>>>>>> 2.1.11 Plain
>>>>>>>  54 threadCount  READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9
>>>>>>> 7.5
>>>>>>>  54 threadCount  WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68
>>>>>>> 7.5
>>>>>>>  54 threadCount  total 100000 13404 13404 13404 4 3.7 6.6 9.2 48
>>>>>>> 69.9 7.5
>>>>>>>
>>>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>>>> Date: Thursday, January 7, 2016 at 1:01 AM
>>>>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
>>>>>>> Praveen <pe...@amazon.com>
>>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>>
>>>>>>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>>>>>>> experience was anywhere from 20-60%, depending on workload).
>>>>>>>
>>>>>>> However, it’s not necessarily true that everything behaves exactly
>>>>>>> the same – in particular, memtables are different, commitlog segment
>>>>>>> handling is different, and GC params may need to be tuned differently for
>>>>>>> 2.1 than 2.0.
>>>>>>>
>>>>>>> When the system is busy, what’s it actually DOING? Cassandra exposes
>>>>>>> a TON of metrics – have you plugged any into a reporting system to see
>>>>>>> what’s going on? Is your latency due to pegged cpu, iowait/disk queues or
>>>>>>> gc pauses?
>>>>>>>
>>>>>>> My colleagues spent a lot of time validating different AWS EBS
>>>>>>> configs (video from reinvent at https://www.youtube.com/wat
>>>>>>> ch?v=1R-mgOcOSd4), 2.1 was faster in almost every case, but you’re
>>>>>>> using an instance size I don’t believe we tried (too little RAM to be
>>>>>>> viable in production).  c3.2xl only gives you 15G of ram – most
>>>>>>> “performance” based systems want 2-4x that (people running G1 heaps usually
>>>>>>> start at 16G heaps and leave another 16-30G for page cache), you’re running
>>>>>>> fairly small hardware – it’s possible that 2.1 isn’t “as good” on smaller
>>>>>>> hardware.
>>>>>>>
>>>>>>> (I do see your domain, presumably you know all of this, but just to
>>>>>>> be sure):
>>>>>>>
>>>>>>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>>>>>>> Which volume sizes? Are they the same between versions? Are you hitting
>>>>>>> your iops limits? Running out of burst tokens? Do you have enhanced
>>>>>>> networking enabled? At load, what part of your system is stressed? Are you
>>>>>>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>>>>>>> memtable_allocation_type -> offheap objects  (available in 2.1, not in
>>>>>>> 2.0)?
>>>>>>>
>>>>>>> Tuning gc_grace is weird – do you understand what it does? Are you
>>>>>>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
>>>>>>> you doing a lot of compaction?
>>>>>>>
>>>>>>>
>>>>>>> From: "Peddi, Praveen"
>>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>>> Date: Wednesday, January 6, 2016 at 11:41 AM
>>>>>>> To: "user@cassandra.apache.org"
>>>>>>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>>
>>>>>>> Hi,
>>>>>>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>>>>>>> environment with pretty much same yaml settings in both (removed unused
>>>>>>> yaml settings and renamed few others) and we have noticed performance on
>>>>>>> 2.1.11 is worse compared to 2.0.9. *After more investigation we
>>>>>>> found that the performance gets worse as we increase replication factor on
>>>>>>> 2.1.11 where as on 2.0.9 performance is more or less same.* Has
>>>>>>> anything architecturally changed as far as replication is concerned in
>>>>>>> 2.1.11?
>>>>>>>
>>>>>>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we
>>>>>>> are obviously doing something different. However the client code, load test
>>>>>>> is all identical in both cases.
>>>>>>>
>>>>>>> Details:
>>>>>>> Nodes: 3 ec2 c3.2x large
>>>>>>> R/W Consistency: QUORUM
>>>>>>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>>>>>>> removed unused properties from yaml file.
>>>>>>> We run compaction aggressive compaction with low gc_grace (15 mins)
>>>>>>> but this is true for both 2.0.9 and 2.1.11.
>>>>>>>
>>>>>>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>>>>>>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>>>>>>> latencies almost doubled (especially reads are much slower than writes).
>>>>>>>
>>>>>>> # Nodes  RF # of rows 2.0.9 2.1.11
>>>>>>> READ
>>>>>>>       P50 P90 P99 P50 P90 P99
>>>>>>> 3 1 450 306 594 747 425 849 1085
>>>>>>> 3 3 450 358 634 877 708 1274 2642
>>>>>>>
>>>>>>> WRITE
>>>>>>> 3 1 10 26 80 179 37 131 196
>>>>>>> 3 3 10 31 96 184 46 166 468
>>>>>>> Any pointers on how to debug performance issues will be appreciated.
>>>>>>>
>>>>>>> Praveen
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ----------------------------------
>>>>> Bestdata.be
>>>>> Optimised ict
>>>>> Tel:+32(0)496609576
>>>>> Corry@BestData.be
>>>>> ----------------------------------
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> ----------------------------------
>>> Bestdata.be
>>> Optimised ict
>>> Tel:+32(0)496609576
>>> Corry@BestData.be
>>> ----------------------------------
>>>
>>>
>>
>>
>> --
>> Dikang
>>
>>
>


-- 
Dikang

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by Dikang Gu <di...@gmail.com>.
This is very expensive:

"MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=10
tid=0x00007f2fd57e1800 nid=0x1cc510 runnable [0x00007f2b971b0000]
   java.lang.Thread.State: RUNNABLE
        at
org.apache.cassandra.db.marshal.IntegerType.compare(IntegerType.java:29)
        at
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:98)
        at
org.apache.cassandra.db.composites.AbstractSimpleCellNameType.compare(AbstractSimpleCellNameType.java:31)
        at java.util.TreeMap.put(TreeMap.java:545)
        at java.util.TreeSet.add(TreeSet.java:255)
        at
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:254)
        at
org.apache.cassandra.db.filter.NamesQueryFilter$Serializer.deserialize(NamesQueryFilter.java:228)
        at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:104)
        at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:156)
        at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)


Checked the git history, it comes from this jira:
https://issues.apache.org/jira/browse/CASSANDRA-5417

Any thoughts?
​

On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta <pa...@gmail.com>
wrote:

> Haven't seen this before, but perhaps it's related to CASSANDRA-10433?
> This is just a wild guess as it's in a related codepath, but maybe worth
> trying out the patch available to see if it helps anything...
>
> 2016-10-28 15:03 GMT-02:00 Dikang Gu <di...@gmail.com>:
>
>> We are seeing huge cpu regression when upgrading one of our 2.0.16
>> cluster to 2.1.14 as well. The 2.1.14 node is not able to handle the same
>> amount of read traffic as the 2.0.16 node, actually, it's less than 50%.
>>
>> And in the perf results, the first line could go as high as 50%, as we
>> turn up the read traffic, which never appeared in 2.0.16.
>>
>> Any thoughts?
>> Thanks
>>
>>
>> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
>> Overhead  Shared Object                          Symbol
>>    6.52%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    4.84%  libzip.so                              [.] adler32
>>    2.88%  perf-196410.map                        [.]
>> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.compare
>>    2.39%  perf-196410.map                        [.]
>> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/marsh
>> al/IntegerType;.findMostSignificantByte
>>    2.03%  perf-196410.map                        [.]
>> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/Decor
>> atedKey;.compareTo
>>    1.65%  perf-196410.map                        [.] vtable chunks
>>    1.44%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in
>> Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
>>    1.02%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>>    1.00%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x0000000000003804
>>    0.87%  perf-196410.map                        [.]
>> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/Abstr
>> actCell$1;.computeNext
>>    0.82%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036dc
>>    0.79%  [kernel]                               [k]
>> copy_user_generic_string
>>    0.73%  perf-196410.map                        [.] vtable chunks
>>    0.71%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable
>> in Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>>    0.70%  [kernel]                               [k] find_busiest_group
>>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>>    0.68%  perf-196410.map                        [.]
>> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>>    0.65%  perf-196410.map                        [.]
>> jbyte_disjoint_arraycopy
>>    0.64%  [kernel]                               [k] _raw_spin_lock
>>    0.63%  [kernel]                               [k] __schedule
>>    0.45%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036df
>>
>> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <co...@bestdata.be>
>> wrote:
>>
>>> @JC, Get the pid of your target java process (something like "ps -ef |
>>> grep -i cassandra") .
>>> Then do a kill -3 <pid> (at unix/linux)
>>> Check the stdout logfile of the process.
>>>  it should contain the threaddump.
>>> If you found it, then great!
>>> Let that kill -3 loop for about 2 or 3 minutes.
>>> Herafter copy paste and load the stdout file into one if the mentioned
>>> tools.
>>> If you are not familiar with the java internals, then those threaddumps
>>> will learn you a lot:)
>>>
>>>
>>>
>>>
>>> Op vrijdag 29 januari 2016 heeft Jean Carlo <je...@gmail.com>
>>> het volgende geschreven:
>>>
>>>> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10.
>>>> I am not good on jvm so I would like to know how to do what @
>>>> CorryOpdenakker  propose with cassandra.
>>>>
>>>> :)
>>>>
>>>> I check concurrent_compactors
>>>>
>>>>
>>>> Saludos
>>>>
>>>> Jean Carlo
>>>>
>>>> "The best way to predict the future is to invent it" Alan Kay
>>>>
>>>> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>> Cassandra is still new for me, but I have a lot of java tuning
>>>>> experience.
>>>>>
>>>>> For root cause detection of performance degradations its always good
>>>>> to start with collecting a series of java thread dumps. Take at problem
>>>>> occurrence using a loopscript for example 60 thread dumps with an interval
>>>>> of 1 or 2 seconds.
>>>>> Then load those dumps into IBM thread dump analyzer or in "eclipse
>>>>> mat" or any similar tool and see which methods appear to be most active or
>>>>> blocking others.
>>>>>
>>>>> Its really very useful
>>>>>
>>>>> Same can be be done in a normal situation to compare the difference.
>>>>>
>>>>> That should give more insights.
>>>>>
>>>>> Cheers, Corry
>>>>>
>>>>>
>>>>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com>
>>>>> het volgende geschreven:
>>>>>
>>>>>> Hello,
>>>>>> We have another update on performance on 2.1.11.
>>>>>> compression_chunk_size  didn’t really help much but We changed
>>>>>> concurrent_compactors from default to 64 in 2.1.11 and read latencies
>>>>>> improved significantly. However, 2.1.11 read latencies are still 1.5 slower
>>>>>> than 2.0.9. One thing we noticed in JMX metric that could affect read
>>>>>> latencies is that 2.1.11 is running ReadRepairedBackground and
>>>>>> ReadRepairedBlocking too frequently compared to 2.0.9 even though our read_
>>>>>> repair_chance is same on both. Could anyone shed some light on why
>>>>>> 2.1.11 could be running read repair 10 to 50 times more in spite of same
>>>>>> configuration on both clusters?
>>>>>>
>>>>>> dclocal_read_repair_chance=0.100000 AND
>>>>>> read_repair_chance=0.000000 AND
>>>>>>
>>>>>> Here is the table for read repair metrics for both clusters.
>>>>>> 2.0.9 2.1.11
>>>>>> ReadRepairedBackground 5MinAvg 0.006 0.1
>>>>>> 15MinAvg 0.009 0.153
>>>>>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>>>>>> 15MinAvg 0.007 0.91
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>>> Reply-To: <us...@cassandra.apache.org>
>>>>>> Date: Thursday, January 14, 2016 at 2:58 PM
>>>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Sorry I wasn’t as explicit as I should have been
>>>>>>
>>>>>> The same buffer size is used by compressed reads as well, but tuned
>>>>>> with compression_chunk_size table property. It’s likely true that if you
>>>>>> lower compression_chunk_size, you’ll see improved read performance.
>>>>>>
>>>>>> This was covered in the AWS re:Invent youtube link I sent in my
>>>>>> original reply.
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: "Peddi, Praveen"
>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>> Date: Thursday, January 14, 2016 at 11:36 AM
>>>>>> To: "user@cassandra.apache.org", Zhiyan Shao
>>>>>> Cc: "Agrawal, Pratik"
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Hi,
>>>>>> We will try with reduced “rar_buffer_size” to 4KB. However
>>>>>> CASSANDRA-10249
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this
>>>>>> only affects users who have 1. disabled compression, 2. switched to
>>>>>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>>>>>> default disk_access_mode which should be mmap. We also used
>>>>>> LZ4Compressor when created table.
>>>>>>
>>>>>> We will let you know if this property had any effect. We were testing
>>>>>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>>>>>> latest version.
>>>>>>
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>>> Reply-To: <us...@cassandra.apache.org>
>>>>>> Date: Thursday, January 14, 2016 at 1:29 PM
>>>>>> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org"
>>>>>> <us...@cassandra.apache.org>
>>>>>> Cc: "Agrawal, Pratik" <pa...@amazon.com>
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> This may be due to https://issues.apache.org/j
>>>>>> ira/browse/CASSANDRA-10249 / https://issues.apache.org/jira/
>>>>>> browse/CASSANDRA-8894 - whether or not this is really the case
>>>>>> depends on how much of your data is in page cache, and whether or not
>>>>>> you’re using mmap. Since the original question was asked by someone using
>>>>>> small RAM instances, it’s possible.
>>>>>>
>>>>>> We mitigate this by dropping compression_chunk_size in order to force
>>>>>> a smaller buffer on reads, so we don’t over read very small blocks. This
>>>>>> has other side effects (lower compression ratio, more garbage during
>>>>>> streaming), but significantly speeds up read workloads for us.
>>>>>>
>>>>>>
>>>>>> From: Zhiyan Shao
>>>>>> Date: Thursday, January 14, 2016 at 9:49 AM
>>>>>> To: "user@cassandra.apache.org"
>>>>>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>>>>>> forum, you can find another thread I sent a while ago. The perf test I did
>>>>>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>>>>>> 2.0.14.
>>>>>>
>>>>>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Jeff for your reply. Sorry for delayed response. We were
>>>>>>> running some more tests and wanted to wait for the results.
>>>>>>>
>>>>>>> So basically we saw higher CPU with 2.1.11 was higher compared to
>>>>>>> 2.0.9 (see below) for the same exact load test. Memory spikes were also
>>>>>>> aggressive on 2.1.11.
>>>>>>>
>>>>>>> So we wanted to rule out any of our custom setting so we ended up
>>>>>>> doing some testing with Cassandra stress test and default Cassandra
>>>>>>> installation. Here are the results we saw between 2.0.9 and 2.1.11. Both
>>>>>>> are default installations and both use Cassandra stress test with same
>>>>>>> params. This is the closest apple-apple comparison we can get. As you can
>>>>>>> see both read and write latencies are 30 to 50% worse in 2.1.11 than 2.0.9.
>>>>>>> Since we are using default installation.
>>>>>>>
>>>>>>> *Highlights of the test:*
>>>>>>> Load: 2x reads and 1x writes
>>>>>>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>>>>>>>
>>>>>>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>>>>>>>
>>>>>>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>>>>>>>
>>>>>>> *One observation is, As the number of threads are increased, 2.1.11
>>>>>>> read latencies are getting worse compared to 2.0.9 (see below table for 24
>>>>>>> threads vs 54 threads)*
>>>>>>> Not sure if anyone has done this kind of comparison before and what
>>>>>>> their thoughts are. I am thinking for this same reason
>>>>>>>
>>>>>>> 2.0.9 Plain  type       total ops     op/s     pk/s    row/s
>>>>>>> mean      med 0.95 0.99 0.999      max    time
>>>>>>>  16 threadCount  READ 66854 7205 7205 7205 1.6 1.3 2.8 3.5 9.6 85.3
>>>>>>> 9.3
>>>>>>>  16 threadCount  WRITE 33146 3572 3572 3572 1.3 1 2.6 3.3 7 206.5
>>>>>>> 9.3
>>>>>>>  16 threadCount  total 100000 10777 10777 10777 1.5 1.3 2.7 3.4 7.9
>>>>>>> 206.5 9.3
>>>>>>> 2.1.11 Plain
>>>>>>>  16 threadCount  READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7
>>>>>>> 9.8
>>>>>>>  16 threadCount  WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7
>>>>>>> 9.8
>>>>>>>  16 threadCount  total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6
>>>>>>> 61.7 9.8
>>>>>>> 2.0.9 Plain
>>>>>>>  24 threadCount  READ 66414 8167 8167 8167 2 1.6 3.7 7.5 16.7 208
>>>>>>> 8.1
>>>>>>>  24 threadCount  WRITE 33586 4130 4130 4130 1.7 1.3 3.4 5.4 25.6
>>>>>>> 45.4 8.1
>>>>>>>  24 threadCount  total 100000 12297 12297 12297 1.9 1.5 3.5 6.2 15.2
>>>>>>> 208 8.1
>>>>>>> 2.1.11 Plain
>>>>>>>  24 threadCount  READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3
>>>>>>> 9
>>>>>>>  24 threadCount  WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2
>>>>>>> 9
>>>>>>>  24 threadCount  total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8
>>>>>>> 38.3 9
>>>>>>> 2.0.9 Plain
>>>>>>>  54 threadCount  READ 67115 13419 13419 13419 2.8 2.6 4.2 6.4 36.9
>>>>>>> 82.4 5
>>>>>>>  54 threadCount  WRITE 32885 6575 6575 6575 2.5 2.3 3.9 5.6 15.9
>>>>>>> 81.5 5
>>>>>>>  54 threadCount  total 100000 19993 19993 19993 2.7 2.5 4.1 5.7 13.9
>>>>>>> 82.4 5
>>>>>>> 2.1.11 Plain
>>>>>>>  54 threadCount  READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9
>>>>>>> 7.5
>>>>>>>  54 threadCount  WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68
>>>>>>> 7.5
>>>>>>>  54 threadCount  total 100000 13404 13404 13404 4 3.7 6.6 9.2 48
>>>>>>> 69.9 7.5
>>>>>>>
>>>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>>>> Date: Thursday, January 7, 2016 at 1:01 AM
>>>>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
>>>>>>> Praveen <pe...@amazon.com>
>>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>>
>>>>>>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>>>>>>> experience was anywhere from 20-60%, depending on workload).
>>>>>>>
>>>>>>> However, it’s not necessarily true that everything behaves exactly
>>>>>>> the same – in particular, memtables are different, commitlog segment
>>>>>>> handling is different, and GC params may need to be tuned differently for
>>>>>>> 2.1 than 2.0.
>>>>>>>
>>>>>>> When the system is busy, what’s it actually DOING? Cassandra exposes
>>>>>>> a TON of metrics – have you plugged any into a reporting system to see
>>>>>>> what’s going on? Is your latency due to pegged cpu, iowait/disk queues or
>>>>>>> gc pauses?
>>>>>>>
>>>>>>> My colleagues spent a lot of time validating different AWS EBS
>>>>>>> configs (video from reinvent at https://www.youtube.com/wat
>>>>>>> ch?v=1R-mgOcOSd4), 2.1 was faster in almost every case, but you’re
>>>>>>> using an instance size I don’t believe we tried (too little RAM to be
>>>>>>> viable in production).  c3.2xl only gives you 15G of ram – most
>>>>>>> “performance” based systems want 2-4x that (people running G1 heaps usually
>>>>>>> start at 16G heaps and leave another 16-30G for page cache), you’re running
>>>>>>> fairly small hardware – it’s possible that 2.1 isn’t “as good” on smaller
>>>>>>> hardware.
>>>>>>>
>>>>>>> (I do see your domain, presumably you know all of this, but just to
>>>>>>> be sure):
>>>>>>>
>>>>>>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>>>>>>> Which volume sizes? Are they the same between versions? Are you hitting
>>>>>>> your iops limits? Running out of burst tokens? Do you have enhanced
>>>>>>> networking enabled? At load, what part of your system is stressed? Are you
>>>>>>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>>>>>>> memtable_allocation_type -> offheap objects  (available in 2.1, not in
>>>>>>> 2.0)?
>>>>>>>
>>>>>>> Tuning gc_grace is weird – do you understand what it does? Are you
>>>>>>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
>>>>>>> you doing a lot of compaction?
>>>>>>>
>>>>>>>
>>>>>>> From: "Peddi, Praveen"
>>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>>> Date: Wednesday, January 6, 2016 at 11:41 AM
>>>>>>> To: "user@cassandra.apache.org"
>>>>>>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>>
>>>>>>> Hi,
>>>>>>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>>>>>>> environment with pretty much same yaml settings in both (removed unused
>>>>>>> yaml settings and renamed few others) and we have noticed performance on
>>>>>>> 2.1.11 is worse compared to 2.0.9. *After more investigation we
>>>>>>> found that the performance gets worse as we increase replication factor on
>>>>>>> 2.1.11 where as on 2.0.9 performance is more or less same.* Has
>>>>>>> anything architecturally changed as far as replication is concerned in
>>>>>>> 2.1.11?
>>>>>>>
>>>>>>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we
>>>>>>> are obviously doing something different. However the client code, load test
>>>>>>> is all identical in both cases.
>>>>>>>
>>>>>>> Details:
>>>>>>> Nodes: 3 ec2 c3.2x large
>>>>>>> R/W Consistency: QUORUM
>>>>>>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>>>>>>> removed unused properties from yaml file.
>>>>>>> We run compaction aggressive compaction with low gc_grace (15 mins)
>>>>>>> but this is true for both 2.0.9 and 2.1.11.
>>>>>>>
>>>>>>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>>>>>>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>>>>>>> latencies almost doubled (especially reads are much slower than writes).
>>>>>>>
>>>>>>> # Nodes  RF # of rows 2.0.9 2.1.11
>>>>>>> READ
>>>>>>>       P50 P90 P99 P50 P90 P99
>>>>>>> 3 1 450 306 594 747 425 849 1085
>>>>>>> 3 3 450 358 634 877 708 1274 2642
>>>>>>>
>>>>>>> WRITE
>>>>>>> 3 1 10 26 80 179 37 131 196
>>>>>>> 3 3 10 31 96 184 46 166 468
>>>>>>> Any pointers on how to debug performance issues will be appreciated.
>>>>>>>
>>>>>>> Praveen
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ----------------------------------
>>>>> Bestdata.be
>>>>> Optimised ict
>>>>> Tel:+32(0)496609576
>>>>> Corry@BestData.be
>>>>> ----------------------------------
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> ----------------------------------
>>> Bestdata.be
>>> Optimised ict
>>> Tel:+32(0)496609576
>>> Corry@BestData.be
>>> ----------------------------------
>>>
>>>
>>
>>
>> --
>> Dikang
>>
>>
>


-- 
Dikang

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Posted by Paulo Motta <pa...@gmail.com>.
Haven't seen this before, but perhaps it's related to CASSANDRA-10433? This
is just a wild guess as it's in a related codepath, but maybe worth trying
out the patch available to see if it helps anything...

2016-10-28 15:03 GMT-02:00 Dikang Gu <di...@gmail.com>:

> We are seeing huge cpu regression when upgrading one of our 2.0.16 cluster
> to 2.1.14 as well. The 2.1.14 node is not able to handle the same amount of
> read traffic as the 2.0.16 node, actually, it's less than 50%.
>
> And in the perf results, the first line could go as high as 50%, as we
> turn up the read traffic, which never appeared in 2.0.16.
>
> Any thoughts?
> Thanks
>
>
> Samples: 952K of event 'cycles', Event count (approx.): 229681774560
> Overhead  Shared Object                          Symbol
>    6.52%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/marshal/IntegerType;.compare in
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    4.84%  libzip.so                              [.] adler32
>    2.88%  perf-196410.map                        [.]
> Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/marsh
> al/IntegerType;.compare
>    2.39%  perf-196410.map                        [.]
> Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/marsh
> al/IntegerType;.findMostSignificantByte
>    2.03%  perf-196410.map                        [.]
> Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/Decor
> atedKey;.compareTo
>    1.65%  perf-196410.map                        [.] vtable chunks
>    1.44%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo in
> Ljava/util/concurrent/ConcurrentSkipListMap;.findNode
>    1.02%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare
>    1.00%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x0000000000003804
>    0.87%  perf-196410.map                        [.]
> Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/Abstr
> actCell$1;.computeNext
>    0.82%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036dc
>    0.79%  [kernel]                               [k]
> copy_user_generic_string
>    0.73%  perf-196410.map                        [.] vtable chunks
>    0.71%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable in
> Lorg/apache/cassandra/db/AbstractCell$1;.computeNext
>    0.70%  [kernel]                               [k] find_busiest_group
>    0.69%  perf-196410.map                        [.] <80>H<F4>3<AE>^?
>    0.68%  perf-196410.map                        [.]
> Lorg/apache/cassandra/db/DecoratedKey;.compareTo
>    0.65%  perf-196410.map                        [.]
> jbyte_disjoint_arraycopy
>    0.64%  [kernel]                               [k] _raw_spin_lock
>    0.63%  [kernel]                               [k] __schedule
>    0.45%  snappy-1.0.5.2-libsnappyjava.so        [.] 0x00000000000036df
>
> On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker <co...@bestdata.be>
> wrote:
>
>> @JC, Get the pid of your target java process (something like "ps -ef |
>> grep -i cassandra") .
>> Then do a kill -3 <pid> (at unix/linux)
>> Check the stdout logfile of the process.
>>  it should contain the threaddump.
>> If you found it, then great!
>> Let that kill -3 loop for about 2 or 3 minutes.
>> Herafter copy paste and load the stdout file into one if the mentioned
>> tools.
>> If you are not familiar with the java internals, then those threaddumps
>> will learn you a lot:)
>>
>>
>>
>>
>> Op vrijdag 29 januari 2016 heeft Jean Carlo <je...@gmail.com>
>> het volgende geschreven:
>>
>>> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
>>> am not good on jvm so I would like to know how to do what @
>>> CorryOpdenakker  propose with cassandra.
>>>
>>> :)
>>>
>>> I check concurrent_compactors
>>>
>>>
>>> Saludos
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker <co...@bestdata.be>
>>> wrote:
>>>
>>>> Hi guys,
>>>> Cassandra is still new for me, but I have a lot of java tuning
>>>> experience.
>>>>
>>>> For root cause detection of performance degradations its always good to
>>>> start with collecting a series of java thread dumps. Take at problem
>>>> occurrence using a loopscript for example 60 thread dumps with an interval
>>>> of 1 or 2 seconds.
>>>> Then load those dumps into IBM thread dump analyzer or in "eclipse mat"
>>>> or any similar tool and see which methods appear to be most active or
>>>> blocking others.
>>>>
>>>> Its really very useful
>>>>
>>>> Same can be be done in a normal situation to compare the difference.
>>>>
>>>> That should give more insights.
>>>>
>>>> Cheers, Corry
>>>>
>>>>
>>>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen <pe...@amazon.com> het
>>>> volgende geschreven:
>>>>
>>>>> Hello,
>>>>> We have another update on performance on 2.1.11.
>>>>> compression_chunk_size  didn’t really help much but We changed
>>>>> concurrent_compactors from default to 64 in 2.1.11 and read latencies
>>>>> improved significantly. However, 2.1.11 read latencies are still 1.5 slower
>>>>> than 2.0.9. One thing we noticed in JMX metric that could affect read
>>>>> latencies is that 2.1.11 is running ReadRepairedBackground and
>>>>> ReadRepairedBlocking too frequently compared to 2.0.9 even though our read_
>>>>> repair_chance is same on both. Could anyone shed some light on why
>>>>> 2.1.11 could be running read repair 10 to 50 times more in spite of same
>>>>> configuration on both clusters?
>>>>>
>>>>> dclocal_read_repair_chance=0.100000 AND
>>>>> read_repair_chance=0.000000 AND
>>>>>
>>>>> Here is the table for read repair metrics for both clusters.
>>>>> 2.0.9 2.1.11
>>>>> ReadRepairedBackground 5MinAvg 0.006 0.1
>>>>> 15MinAvg 0.009 0.153
>>>>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>>>>> 15MinAvg 0.007 0.91
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>> Reply-To: <us...@cassandra.apache.org>
>>>>> Date: Thursday, January 14, 2016 at 2:58 PM
>>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Sorry I wasn’t as explicit as I should have been
>>>>>
>>>>> The same buffer size is used by compressed reads as well, but tuned
>>>>> with compression_chunk_size table property. It’s likely true that if you
>>>>> lower compression_chunk_size, you’ll see improved read performance.
>>>>>
>>>>> This was covered in the AWS re:Invent youtube link I sent in my
>>>>> original reply.
>>>>>
>>>>>
>>>>>
>>>>> From: "Peddi, Praveen"
>>>>> Reply-To: "user@cassandra.apache.org"
>>>>> Date: Thursday, January 14, 2016 at 11:36 AM
>>>>> To: "user@cassandra.apache.org", Zhiyan Shao
>>>>> Cc: "Agrawal, Pratik"
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Hi,
>>>>> We will try with reduced “rar_buffer_size” to 4KB. However
>>>>> CASSANDRA-10249
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this
>>>>> only affects users who have 1. disabled compression, 2. switched to
>>>>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>>>>> default disk_access_mode which should be mmap. We also used
>>>>> LZ4Compressor when created table.
>>>>>
>>>>> We will let you know if this property had any effect. We were testing
>>>>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>>>>> latest version.
>>>>>
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>> Reply-To: <us...@cassandra.apache.org>
>>>>> Date: Thursday, January 14, 2016 at 1:29 PM
>>>>> To: Zhiyan Shao <zh...@gmail.com>, "user@cassandra.apache.org" <
>>>>> user@cassandra.apache.org>
>>>>> Cc: "Agrawal, Pratik" <pa...@amazon.com>
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> This may be due to https://issues.apache.org/j
>>>>> ira/browse/CASSANDRA-10249 / https://issues.apache.org/jira/
>>>>> browse/CASSANDRA-8894 - whether or not this is really the case
>>>>> depends on how much of your data is in page cache, and whether or not
>>>>> you’re using mmap. Since the original question was asked by someone using
>>>>> small RAM instances, it’s possible.
>>>>>
>>>>> We mitigate this by dropping compression_chunk_size in order to force
>>>>> a smaller buffer on reads, so we don’t over read very small blocks. This
>>>>> has other side effects (lower compression ratio, more garbage during
>>>>> streaming), but significantly speeds up read workloads for us.
>>>>>
>>>>>
>>>>> From: Zhiyan Shao
>>>>> Date: Thursday, January 14, 2016 at 9:49 AM
>>>>> To: "user@cassandra.apache.org"
>>>>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>
>>>>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>>>>> forum, you can find another thread I sent a while ago. The perf test I did
>>>>> indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with
>>>>> 2.0.14.
>>>>>
>>>>> On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen <pe...@amazon.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Jeff for your reply. Sorry for delayed response. We were
>>>>>> running some more tests and wanted to wait for the results.
>>>>>>
>>>>>> So basically we saw higher CPU with 2.1.11 was higher compared to
>>>>>> 2.0.9 (see below) for the same exact load test. Memory spikes were also
>>>>>> aggressive on 2.1.11.
>>>>>>
>>>>>> So we wanted to rule out any of our custom setting so we ended up
>>>>>> doing some testing with Cassandra stress test and default Cassandra
>>>>>> installation. Here are the results we saw between 2.0.9 and 2.1.11. Both
>>>>>> are default installations and both use Cassandra stress test with same
>>>>>> params. This is the closest apple-apple comparison we can get. As you can
>>>>>> see both read and write latencies are 30 to 50% worse in 2.1.11 than 2.0.9.
>>>>>> Since we are using default installation.
>>>>>>
>>>>>> *Highlights of the test:*
>>>>>> Load: 2x reads and 1x writes
>>>>>> CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
>>>>>>
>>>>>> Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
>>>>>>
>>>>>> Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
>>>>>>
>>>>>> *One observation is, As the number of threads are increased, 2.1.11
>>>>>> read latencies are getting worse compared to 2.0.9 (see below table for 24
>>>>>> threads vs 54 threads)*
>>>>>> Not sure if anyone has done this kind of comparison before and what
>>>>>> their thoughts are. I am thinking for this same reason
>>>>>>
>>>>>> 2.0.9 Plain  type       total ops     op/s     pk/s    row/s     mean
>>>>>> med 0.95 0.99 0.999      max    time
>>>>>>  16 threadCount  READ 66854 7205 7205 7205 1.6 1.3 2.8 3.5 9.6 85.3
>>>>>> 9.3
>>>>>>  16 threadCount  WRITE 33146 3572 3572 3572 1.3 1 2.6 3.3 7 206.5 9.3
>>>>>>  16 threadCount  total 100000 10777 10777 10777 1.5 1.3 2.7 3.4 7.9
>>>>>> 206.5 9.3
>>>>>> 2.1.11 Plain
>>>>>>  16 threadCount  READ 67096 6818 6818 6818 1.6 1.5 2.6 3.5 7.9 61.7
>>>>>> 9.8
>>>>>>  16 threadCount  WRITE 32904 3344 3344 3344 1.4 1.3 2.3 3 6.5 56.7
>>>>>> 9.8
>>>>>>  16 threadCount  total 100000 10162 10162 10162 1.6 1.4 2.5 3.2 6
>>>>>> 61.7 9.8
>>>>>> 2.0.9 Plain
>>>>>>  24 threadCount  READ 66414 8167 8167 8167 2 1.6 3.7 7.5 16.7 208 8.1
>>>>>>  24 threadCount  WRITE 33586 4130 4130 4130 1.7 1.3 3.4 5.4 25.6 45.4
>>>>>> 8.1
>>>>>>  24 threadCount  total 100000 12297 12297 12297 1.9 1.5 3.5 6.2 15.2
>>>>>> 208 8.1
>>>>>> 2.1.11 Plain
>>>>>>  24 threadCount  READ 66628 7433 7433 7433 2.2 2.1 3.4 4.3 8.4 38.3 9
>>>>>>  24 threadCount  WRITE 33372 3723 3723 3723 2 1.9 3.1 3.8 21.9 37.2 9
>>>>>>  24 threadCount  total 100000 11155 11155 11155 2.1 2 3.3 4.1 8.8
>>>>>> 38.3 9
>>>>>> 2.0.9 Plain
>>>>>>  54 threadCount  READ 67115 13419 13419 13419 2.8 2.6 4.2 6.4 36.9
>>>>>> 82.4 5
>>>>>>  54 threadCount  WRITE 32885 6575 6575 6575 2.5 2.3 3.9 5.6 15.9 81.5
>>>>>> 5
>>>>>>  54 threadCount  total 100000 19993 19993 19993 2.7 2.5 4.1 5.7 13.9
>>>>>> 82.4 5
>>>>>> 2.1.11 Plain
>>>>>>  54 threadCount  READ 66780 8951 8951 8951 4.3 3.9 6.8 9.7 49.4 69.9
>>>>>> 7.5
>>>>>>  54 threadCount  WRITE 33220 4453 4453 4453 3.5 3.2 5.7 8.2 36.8 68
>>>>>> 7.5
>>>>>>  54 threadCount  total 100000 13404 13404 13404 4 3.7 6.6 9.2 48 69.9
>>>>>> 7.5
>>>>>>
>>>>>> From: Jeff Jirsa <je...@crowdstrike.com>
>>>>>> Date: Thursday, January 7, 2016 at 1:01 AM
>>>>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>, Peddi
>>>>>> Praveen <pe...@amazon.com>
>>>>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our
>>>>>> experience was anywhere from 20-60%, depending on workload).
>>>>>>
>>>>>> However, it’s not necessarily true that everything behaves exactly
>>>>>> the same – in particular, memtables are different, commitlog segment
>>>>>> handling is different, and GC params may need to be tuned differently for
>>>>>> 2.1 than 2.0.
>>>>>>
>>>>>> When the system is busy, what’s it actually DOING? Cassandra exposes
>>>>>> a TON of metrics – have you plugged any into a reporting system to see
>>>>>> what’s going on? Is your latency due to pegged cpu, iowait/disk queues or
>>>>>> gc pauses?
>>>>>>
>>>>>> My colleagues spent a lot of time validating different AWS EBS
>>>>>> configs (video from reinvent at https://www.youtube.com/wat
>>>>>> ch?v=1R-mgOcOSd4), 2.1 was faster in almost every case, but you’re
>>>>>> using an instance size I don’t believe we tried (too little RAM to be
>>>>>> viable in production).  c3.2xl only gives you 15G of ram – most
>>>>>> “performance” based systems want 2-4x that (people running G1 heaps usually
>>>>>> start at 16G heaps and leave another 16-30G for page cache), you’re running
>>>>>> fairly small hardware – it’s possible that 2.1 isn’t “as good” on smaller
>>>>>> hardware.
>>>>>>
>>>>>> (I do see your domain, presumably you know all of this, but just to
>>>>>> be sure):
>>>>>>
>>>>>> You’re using c3, so presumably you’re using EBS – are you using GP2?
>>>>>> Which volume sizes? Are they the same between versions? Are you hitting
>>>>>> your iops limits? Running out of burst tokens? Do you have enhanced
>>>>>> networking enabled? At load, what part of your system is stressed? Are you
>>>>>> cpu bound? Are you seeing GC pauses hurt latency? Have you tried changing
>>>>>> memtable_allocation_type -> offheap objects  (available in 2.1, not in
>>>>>> 2.0)?
>>>>>>
>>>>>> Tuning gc_grace is weird – do you understand what it does? Are you
>>>>>> overwriting or deleting a lot of data in your test (that’d be unusual)? Are
>>>>>> you doing a lot of compaction?
>>>>>>
>>>>>>
>>>>>> From: "Peddi, Praveen"
>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>> Date: Wednesday, January 6, 2016 at 11:41 AM
>>>>>> To: "user@cassandra.apache.org"
>>>>>> Subject: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>>>>
>>>>>> Hi,
>>>>>> We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest
>>>>>> environment with pretty much same yaml settings in both (removed unused
>>>>>> yaml settings and renamed few others) and we have noticed performance on
>>>>>> 2.1.11 is worse compared to 2.0.9. *After more investigation we
>>>>>> found that the performance gets worse as we increase replication factor on
>>>>>> 2.1.11 where as on 2.0.9 performance is more or less same.* Has
>>>>>> anything architecturally changed as far as replication is concerned in
>>>>>> 2.1.11?
>>>>>>
>>>>>> All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we
>>>>>> are obviously doing something different. However the client code, load test
>>>>>> is all identical in both cases.
>>>>>>
>>>>>> Details:
>>>>>> Nodes: 3 ec2 c3.2x large
>>>>>> R/W Consistency: QUORUM
>>>>>> Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and
>>>>>> removed unused properties from yaml file.
>>>>>> We run compaction aggressive compaction with low gc_grace (15 mins)
>>>>>> but this is true for both 2.0.9 and 2.1.11.
>>>>>>
>>>>>> As you can see, all p50, p90 and p99 latencies stayed with in 10%
>>>>>> difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11
>>>>>> latencies almost doubled (especially reads are much slower than writes).
>>>>>>
>>>>>> # Nodes  RF # of rows 2.0.9 2.1.11
>>>>>> READ
>>>>>>       P50 P90 P99 P50 P90 P99
>>>>>> 3 1 450 306 594 747 425 849 1085
>>>>>> 3 3 450 358 634 877 708 1274 2642
>>>>>>
>>>>>> WRITE
>>>>>> 3 1 10 26 80 179 37 131 196
>>>>>> 3 3 10 31 96 184 46 166 468
>>>>>> Any pointers on how to debug performance issues will be appreciated.
>>>>>>
>>>>>> Praveen
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> ----------------------------------
>>>> Bestdata.be
>>>> Optimised ict
>>>> Tel:+32(0)496609576
>>>> Corry@BestData.be
>>>> ----------------------------------
>>>>
>>>>
>>>
>>
>> --
>> ----------------------------------
>> Bestdata.be
>> Optimised ict
>> Tel:+32(0)496609576
>> Corry@BestData.be
>> ----------------------------------
>>
>>
>
>
> --
> Dikang
>
>