You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Yuan Fang <yu...@kryptoncloud.com> on 2016/07/07 20:25:47 UTC

Is my cluster normal?

I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB ssd
EBS).
I can reach a cluster wide write requests of 30k/second and read request
about 100/second. The cluster OS load constantly above 10. Are those normal?

Thanks!


Best,

Yuan

Re: Is my cluster normal?

Posted by daemeon reiydelle <da...@gmail.com>.

Those numbers, as I suspected, line up pretty well with your AWS
configuration and network latencies within AWS. It is clear that this is a
WRITE ONLY test. You might want to do a mixed (e.g. 50% read, 50% write)
test for sanity. Note that the test will populate the data BEFORE it begins
doing the read/write tests.

In a dedicated environment at a recent client, with 10gbit links (just
grabbing one casstest run from my archives) I see less than twice the
above. Note your latency max is the result of a stop-the-world garbage
collection. There were huge problems below because this particular run was
using 24gb (Cassandra 2.x) java heap.

op rate                   : 21567 [WRITE:21567]
partition rate            : 21567 [WRITE:21567]
row rate                  : 21567 [WRITE:21567]
latency mean              : 9.3 [WRITE:9.3]
latency median            : 7.7 [WRITE:7.7]
latency 95th percentile   : 13.2 [WRITE:13.2]
latency 99th percentile   : 32.6 [WRITE:32.6]
latency 99.9th percentile : 97.2 [WRITE:97.2]
latency max               : 14906.1 [WRITE:14906.1]
Total partitions          : 83333333 [WRITE:83333333]
Total errors              : 0 [WRITE:0]
total gc count            : 705
total gc mb               : 1691132
total gc time (s)         : 30
avg gc time(ms)           : 43
stdev gc time(ms)         : 13
Total operation time      : 01:04:23


*.......*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Jul 7, 2016 at 2:51 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

> Yes, here is my stress test result:
> Results:
> op rate                   : 12200 [WRITE:12200]
> partition rate            : 12200 [WRITE:12200]
> row rate                  : 12200 [WRITE:12200]
> latency mean              : 16.4 [WRITE:16.4]
> latency median            : 7.1 [WRITE:7.1]
> latency 95th percentile   : 38.1 [WRITE:38.1]
> latency 99th percentile   : 204.3 [WRITE:204.3]
> latency 99.9th percentile : 465.9 [WRITE:465.9]
> latency max               : 1408.4 [WRITE:1408.4]
> Total partitions          : 1000000 [WRITE:1000000]
> Total errors              : 0 [WRITE:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:01:21
> END
>
> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>
>> Lots of variables you're leaving out.
>>
>> Depends on write size, if you're using logged batch or not, what
>> consistency level, what RF, if the writes come in bursts, etc, etc.
>> However, that's all sort of moot for determining "normal" really you need a
>> baseline as all those variables end up mattering a huge amount.
>>
>> I would suggest using Cassandra stress as a baseline and go from there
>> depending on what those numbers say (just pick the defaults).
>>
>> Sent from my iPhone
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>> wrote:
>>
>>> Are you saying 7k writes per node? or 30k writes per node?
>>>
>>>
>>> *.......*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>>> writes 30k/second is the main thing.
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>> fit into the row cache.
>>>>>
>>>>>
>>>>> *.......*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>> 600GB ssd EBS).
>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>> those normal?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Yuan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

In addition, it seems the compaction is very often. It happens like every
couple of seconds and one after one. It seems causing high load.

On Wed, Jul 13, 2016 at 10:32 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:

> $nodetool tpstats
>
> ...
> Pool Name                               Active   Pending   Completed
> Blocked      All time blocked
> Native-Transport-Requests       128       128        1420623949         1
>         142821509
> ...
>
>
>
> What is this? Is it normal?
>
> On Tue, Jul 12, 2016 at 3:03 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>> Hi Jonathan,
>>
>> Here is the result:
>>
>> ubuntu@ip-172-31-44-250:~$ iostat -dmx 2 10
>> Linux 3.13.0-74-generic (ip-172-31-44-250) 07/12/2016 _x86_64_ (4 CPU)
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.01     2.13    0.74    1.55     0.01     0.02
>>  27.77     0.00    0.74    0.89    0.66   0.43   0.10
>> xvdf              0.01     0.58  237.41   52.50    12.90     6.21
>> 135.02     2.32    8.01    3.65   27.72   0.57  16.63
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     7.50    0.00    2.50     0.00     0.04
>>  32.00     0.00    1.60    0.00    1.60   1.60   0.40
>> xvdf              0.00     0.00  353.50    0.00    24.12     0.00
>> 139.75     0.49    1.37    1.37    0.00   0.58  20.60
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    1.00     0.00     0.00
>> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     2.00  463.50   35.00    30.69     2.86
>> 137.84     0.88    1.77    1.29    8.17   0.60  30.00
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    1.00     0.00     0.00
>> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00   99.50   36.00     8.54     4.40
>> 195.62     1.55    3.88    1.45   10.61   1.06  14.40
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     5.00    0.00    1.50     0.00     0.03
>>  34.67     0.00    1.33    0.00    1.33   1.33   0.20
>> xvdf              0.00     1.50  703.00  195.00    48.83    23.76
>> 165.57     6.49    8.36    1.66   32.51   0.55  49.80
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    1.00     0.00     0.04
>>  72.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     2.50  149.50   69.50    10.12     6.68
>> 157.14     0.74    3.42    1.18    8.23   0.51  11.20
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     5.00    0.00    2.50     0.00     0.03
>>  24.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00   61.50   22.50     5.36     2.75
>> 197.64     0.33    3.93    1.50   10.58   0.88   7.40
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    0.50     0.00     0.00
>> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00  375.00    0.00    24.84     0.00
>> 135.64     0.45    1.20    1.20    0.00   0.57  21.20
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     1.00    0.00    6.00     0.00     0.03
>> 9.33     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00  542.50   23.50    35.08     2.83
>> 137.16     0.80    1.41    1.15    7.23   0.49  28.00
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     3.50    0.50    1.50     0.00     0.02
>>  24.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     1.50  272.00  153.50    16.18    18.67
>> 167.73    14.32   33.66    1.39   90.84   0.81  34.60
>>
>>
>>
>> On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> When you have high system load it means your CPU is waiting for
>>> *something*, and in my experience it's usually slow disk.  A disk connected
>>> over network has been a culprit for me many times.
>>>
>>> On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>>> Can do you do:
>>>>
>>>> iostat -dmx 2 10
>>>>
>>>>
>>>>
>>>> On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> The read being low is because we do not have much read operations
>>>>> right now.
>>>>>
>>>>> The heap is only 4GB.
>>>>>
>>>>> MAX_HEAP_SIZE=4GB
>>>>>
>>>>> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com
>>>>> > wrote:
>>>>>
>>>>>> EBS iops scale with volume size.
>>>>>>
>>>>>>
>>>>>>
>>>>>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting
>>>>>> those on writes, you’re going to suffer on reads.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You have a 16G server, and probably a good chunk of that allocated to
>>>>>> heap. Consequently, you have almost no page cache, so your reads are going
>>>>>> to hit the disk. Your reads being very low is not uncommon if you have no
>>>>>> page cache – the default settings for Cassandra (64k compression chunks)
>>>>>> are really inefficient for small reads served off of disk. If you drop the
>>>>>> compression chunk size (4k, for example), you’ll probably see your read
>>>>>> throughput increase significantly, which will give you more iops for
>>>>>> commitlog, so write throughput likely goes up, too.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Jonathan Haddad <jo...@jonhaddad.com>
>>>>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>>>>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>>> *Subject: *Re: Is my cluster normal?
>>>>>>
>>>>>>
>>>>>>
>>>>>> What's your CPU looking like? If it's low, check your IO with iostat
>>>>>> or dstat. I know some people have used Ebs and say it's fine but ive been
>>>>>> burned too many times.
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Riccardo,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Very low IO-wait. About 0.3%.
>>>>>>
>>>>>> No stolen CPU. It is a casssandra only instance. I did not see any
>>>>>> dropped messages.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>>>>>
>>>>>> Pool Name                    Active   Pending      Completed
>>>>>> Blocked  All time blocked
>>>>>>
>>>>>> MutationStage                     1         1      929509244
>>>>>> 0                 0
>>>>>>
>>>>>> ViewMutationStage                 0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> ReadStage                         4         0        4021570
>>>>>> 0                 0
>>>>>>
>>>>>> RequestResponseStage              0         0      731477999
>>>>>> 0                 0
>>>>>>
>>>>>> ReadRepairStage                   0         0         165603
>>>>>> 0                 0
>>>>>>
>>>>>> CounterMutationStage              0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> MiscStage                         0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> CompactionExecutor                2        55          92022
>>>>>> 0                 0
>>>>>>
>>>>>> MemtableReclaimMemory             0         0           1736
>>>>>> 0                 0
>>>>>>
>>>>>> PendingRangeCalculator            0         0              6
>>>>>> 0                 0
>>>>>>
>>>>>> GossipStage                       0         0         345474
>>>>>> 0                 0
>>>>>>
>>>>>> SecondaryIndexManagement          0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> HintsDispatcher                   0         0              4
>>>>>> 0                 0
>>>>>>
>>>>>> MigrationStage                    0         0             35
>>>>>> 0                 0
>>>>>>
>>>>>> MemtablePostFlush                 0         0           1973
>>>>>> 0                 0
>>>>>>
>>>>>> ValidationExecutor                0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> Sampler                           0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> MemtableFlushWriter               0         0           1736
>>>>>> 0                 0
>>>>>>
>>>>>> InternalResponseStage             0         0           5311
>>>>>> 0                 0
>>>>>>
>>>>>> AntiEntropyStage                  0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> CacheCleanupExecutor              0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> Native-Transport-Requests       128       128      347508531
>>>>>> 2          15891862
>>>>>>
>>>>>>
>>>>>>
>>>>>> Message type           Dropped
>>>>>>
>>>>>> READ                         0
>>>>>>
>>>>>> RANGE_SLICE                  0
>>>>>>
>>>>>> _TRACE                       0
>>>>>>
>>>>>> HINT                         0
>>>>>>
>>>>>> MUTATION                     0
>>>>>>
>>>>>> COUNTER_MUTATION             0
>>>>>>
>>>>>> BATCH_STORE                  0
>>>>>>
>>>>>> BATCH_REMOVE                 0
>>>>>>
>>>>>> REQUEST_RESPONSE             0
>>>>>>
>>>>>> PAGED_RANGE                  0
>>>>>>
>>>>>> READ_REPAIR                  0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Yuan,
>>>>>>
>>>>>>
>>>>>>
>>>>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!),
>>>>>> aside from any Cassandra specific discussion a system load of 10 on a 4
>>>>>> threads machine is way too much in my opinion. If that is the running
>>>>>> average system load I would look deeper into system details. Is that IO
>>>>>> wait? Is that CPU Stolen? Is that a Cassandra only instance or are there
>>>>>> other processes pushing the load?
>>>>>>
>>>>>> What does your "nodetool tpstats" say? Hoe many dropped messages do
>>>>>> you have?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>> Thanks Ben! For the post, it seems they got a little better but
>>>>>> similar result than i did. Good to know it.
>>>>>>
>>>>>> I am not sure if a little fine tuning of heap memory will help or
>>>>>> not.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <
>>>>>> ben.slater@instaclustr.com> wrote:
>>>>>>
>>>>>> Hi Yuan,
>>>>>>
>>>>>>
>>>>>>
>>>>>> You might find this blog post a useful comparison:
>>>>>>
>>>>>>
>>>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>>>> also some single DC benchmarks of m4.xl
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>>>>>> clusters plus some discussion of how we went about benchmarking.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Ben
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>>
>>>>>> Yes, here is my stress test result:
>>>>>>
>>>>>> Results:
>>>>>>
>>>>>> op rate                   : 12200 [WRITE:12200]
>>>>>>
>>>>>> partition rate            : 12200 [WRITE:12200]
>>>>>>
>>>>>> row rate                  : 12200 [WRITE:12200]
>>>>>>
>>>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>>>
>>>>>> latency median            : 7.1 [WRITE:7.1]
>>>>>>
>>>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>>>
>>>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>>>
>>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>>>
>>>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>>>
>>>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>>>
>>>>>> Total errors              : 0 [WRITE:0]
>>>>>>
>>>>>> total gc count            : 0
>>>>>>
>>>>>> total gc mb               : 0
>>>>>>
>>>>>> total gc time (s)         : 0
>>>>>>
>>>>>> avg gc time(ms)           : NaN
>>>>>>
>>>>>> stdev gc time(ms)         : 0
>>>>>>
>>>>>> Total operation time      : 00:01:21
>>>>>>
>>>>>> END
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>>>
>>>>>> Lots of variables you're leaving out.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Depends on write size, if you're using logged batch or not, what
>>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>>> However, that's all sort of moot for determining "normal" really you need a
>>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would suggest using Cassandra stress as a baseline and go from
>>>>>> there depending on what those numbers say (just pick the defaults).
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>>
>>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>>
>>>>>> yes, it is about 8k writes per node.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>> writes 30k/second is the main thing.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>>> fit into the row cache.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>> 600GB ssd EBS).
>>>>>>
>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>> those normal?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yuan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ————————
>>>>>>
>>>>>> Ben Slater
>>>>>>
>>>>>> Chief Product Officer
>>>>>>
>>>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>>>>
>>>>>> +61 437 929 798
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
>

Re: Is my cluster normal?

Posted by Romain Hardouin <ro...@yahoo.fr>.

Same behavior here with a very different setup.After an upgrade to 2.1.14 (from 2.0.17) I see a high load and many NTR "all time blocked". Offheap memtable lowered the blocked NTR for me, I put a comment on CASSANDRA-11363 
Best,
Romain

    Le Mercredi 13 juillet 2016 20h18, Yuan Fang <yu...@kryptoncloud.com> a écrit :
 

 Sometimes, the Pending can change from 128 to 129, 125 etc.

On Wed, Jul 13, 2016 at 10:32 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:

$nodetool tpstats 
...Pool Name                               Active   Pending   Completed   Blocked      All time blocked
Native-Transport-Requests       128       128        1420623949         1         142821509
...


What is this? Is it normal?
On Tue, Jul 12, 2016 at 3:03 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

Hi Jonathan,
Here is the result:
ubuntu@ip-172-31-44-250:~$ iostat -dmx 2 10Linux 3.13.0-74-generic (ip-172-31-44-250)  07/12/2016  _x86_64_ (4 CPU)
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.01     2.13    0.74    1.55     0.01     0.02    27.77     0.00    0.74    0.89    0.66   0.43   0.10xvdf              0.01     0.58  237.41   52.50    12.90     6.21   135.02     2.32    8.01    3.65   27.72   0.57  16.63
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     7.50    0.00    2.50     0.00     0.04    32.00     0.00    1.60    0.00    1.60   1.60   0.40xvdf              0.00     0.00  353.50    0.00    24.12     0.00   139.75     0.49    1.37    1.37    0.00   0.58  20.60
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00    0.00    1.00     0.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00xvdf              0.00     2.00  463.50   35.00    30.69     2.86   137.84     0.88    1.77    1.29    8.17   0.60  30.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00    0.00    1.00     0.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00xvdf              0.00     0.00   99.50   36.00     8.54     4.40   195.62     1.55    3.88    1.45   10.61   1.06  14.40
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     5.00    0.00    1.50     0.00     0.03    34.67     0.00    1.33    0.00    1.33   1.33   0.20xvdf              0.00     1.50  703.00  195.00    48.83    23.76   165.57     6.49    8.36    1.66   32.51   0.55  49.80
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00    0.00    1.00     0.00     0.04    72.00     0.00    0.00    0.00    0.00   0.00   0.00xvdf              0.00     2.50  149.50   69.50    10.12     6.68   157.14     0.74    3.42    1.18    8.23   0.51  11.20
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     5.00    0.00    2.50     0.00     0.03    24.00     0.00    0.00    0.00    0.00   0.00   0.00xvdf              0.00     0.00   61.50   22.50     5.36     2.75   197.64     0.33    3.93    1.50   10.58   0.88   7.40
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00    0.00    0.50     0.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00xvdf              0.00     0.00  375.00    0.00    24.84     0.00   135.64     0.45    1.20    1.20    0.00   0.57  21.20
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     1.00    0.00    6.00     0.00     0.03     9.33     0.00    0.00    0.00    0.00   0.00   0.00xvdf              0.00     0.00  542.50   23.50    35.08     2.83   137.16     0.80    1.41    1.15    7.23   0.49  28.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     3.50    0.50    1.50     0.00     0.02    24.00     0.00    0.00    0.00    0.00   0.00   0.00xvdf              0.00     1.50  272.00  153.50    16.18    18.67   167.73    14.32   33.66    1.39   90.84   0.81  34.60


On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

When you have high system load it means your CPU is waiting for *something*, and in my experience it's usually slow disk.  A disk connected over network has been a culprit for me many times.
On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

Can do you do:
iostat -dmx 2 10 


On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com> wrote:

Hi Jeff,
The read being low is because we do not have much read operations right now.
The heap is only 4GB.
MAX_HEAP_SIZE=4GB
On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <je...@crowdstrike.com> wrote:

EBS iops scale with volume size. A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those on writes, you’re going to suffer on reads. You have a 16G server, and probably a good chunk of that allocated to heap. Consequently, you have almost no page cache, so your reads are going to hit the disk. Your reads being very low is not uncommon if you have no page cache – the default settings for Cassandra (64k compression chunks) are really inefficient for small reads served off of disk. If you drop the compression chunk size (4k, for example), you’ll probably see your read throughput increase significantly, which will give you more iops for commitlog, so write throughput likely goes up, too.   From: Jonathan Haddad <jo...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Thursday, July 7, 2016 at 6:54 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Is my cluster normal? What's your CPU looking like? If it's low, check your IO with iostat or dstat. I know some people have used Ebs and say it's fine but ive been burned too many times. On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:
Hi Riccardo,  Very low IO-wait. About 0.3%.No stolen CPU. It is a casssandra only instance. I did not see any dropped messages.  ubuntu@cassandra1:/mnt/data$ nodetool tpstatsPool Name                    Active   Pending      Completed   Blocked  All time blockedMutationStage                     1         1      929509244         0                 0ViewMutationStage                 0         0              0         0                 0ReadStage                         4         0        4021570         0                 0RequestResponseStage              0         0      731477999         0                 0ReadRepairStage                   0         0         165603         0                 0CounterMutationStage              0         0              0         0                 0MiscStage                         0         0              0         0                 0CompactionExecutor                2        55          92022         0                 0MemtableReclaimMemory             0         0           1736         0                 0PendingRangeCalculator            0         0              6         0                 0GossipStage                       0         0         345474         0                 0SecondaryIndexManagement          0         0              0         0                 0HintsDispatcher                   0         0              4         0                 0MigrationStage                    0         0             35         0                 0MemtablePostFlush                 0         0           1973         0                 0ValidationExecutor                0         0              0         0                 0Sampler                           0         0              0         0                 0MemtableFlushWriter               0         0           1736         0                 0InternalResponseStage             0         0           5311         0                 0AntiEntropyStage                  0         0              0         0                 0CacheCleanupExecutor              0         0              0         0                 0Native-Transport-Requests       128       128      347508531         2          15891862 Message type           DroppedREAD                         0RANGE_SLICE                  0_TRACE                       0HINT                         0MUTATION                     0COUNTER_MUTATION             0BATCH_STORE                  0BATCH_REMOVE                 0REQUEST_RESPONSE             0PAGED_RANGE                  0READ_REPAIR                  0     On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com> wrote:
Hi Yuan,  You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside from any Cassandra specific discussion a system load of 10 on a 4 threads machine is way too much in my opinion. If that is the running average system load I would look deeper into system details. Is that IO wait? Is that CPU Stolen? Is that a Cassandra only instance or are there other processes pushing the load?What does your "nodetool tpstats" say? Hoe many dropped messages do you have? Best, On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:
Thanks Ben! For the post, it seems they got a little better but similar result than i did. Good to know it. I am not sure if a little fine tuning of heap memory will help or not.    On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com> wrote:
Hi Yuan,  You might find this blog post a useful comparison:https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/ Although the focus is on Spark and Cassandra and multi-DC there are also some single DC benchmarks of m4.xl clusters plus some discussion of how we went about benchmarking. CheersBen  On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
Yes, here is my stress test result: Results:op rate                   : 12200 [WRITE:12200]partition rate            : 12200 [WRITE:12200]row rate                  : 12200 [WRITE:12200]latency mean              : 16.4 [WRITE:16.4]latency median            : 7.1 [WRITE:7.1]latency 95th percentile   : 38.1 [WRITE:38.1]latency 99th percentile   : 204.3 [WRITE:204.3]latency 99.9th percentile : 465.9 [WRITE:465.9]latency max               : 1408.4 [WRITE:1408.4]Total partitions          : 1000000 [WRITE:1000000]Total errors              : 0 [WRITE:0]total gc count            : 0total gc mb               : 0total gc time (s)         : 0avg gc time(ms)           : NaNstdev gc time(ms)         : 0Total operation time      : 00:01:21END On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
Lots of variables you're leaving out. Depends on write size, if you're using logged batch or not, what consistency level, what RF, if the writes come in bursts, etc, etc. However, that's all sort of moot for determining "normal" really you need a baseline as all those variables end up mattering a huge amount. I would suggest using Cassandra stress as a baseline and go from there depending on what those numbers say (just pick the defaults).

Sent from my iPhone
On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
yes, it is about 8k writes per node.    On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com> wrote:
Are you saying 7k writes per node? or 30k writes per node?

.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872 On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
writes 30k/second is the main thing.   On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com> wrote:
Assuming you meant 100k, that likely for something with 16mb of storage (probably way small) where the data is more that 64k hence will not fit into the row cache.

.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872 On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
 I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB ssd EBS). I can reach a cluster wide write requests of 30k/second and read request about 100/second. The cluster OS load constantly above 10. Are those normal? Thanks!  Best, Yuan  
 
 
 
 

 
-- ———————— Ben Slater Chief Product OfficerInstaclustr: Cassandra + Spark - Managed | Consulting | Support+61 437 929 798

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Sometimes, the Pending can change from 128 to 129, 125 etc.


On Wed, Jul 13, 2016 at 10:32 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:

> $nodetool tpstats
>
> ...
> Pool Name                               Active   Pending   Completed
> Blocked      All time blocked
> Native-Transport-Requests       128       128        1420623949         1
>         142821509
> ...
>
>
>
> What is this? Is it normal?
>
> On Tue, Jul 12, 2016 at 3:03 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>> Hi Jonathan,
>>
>> Here is the result:
>>
>> ubuntu@ip-172-31-44-250:~$ iostat -dmx 2 10
>> Linux 3.13.0-74-generic (ip-172-31-44-250) 07/12/2016 _x86_64_ (4 CPU)
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.01     2.13    0.74    1.55     0.01     0.02
>>  27.77     0.00    0.74    0.89    0.66   0.43   0.10
>> xvdf              0.01     0.58  237.41   52.50    12.90     6.21
>> 135.02     2.32    8.01    3.65   27.72   0.57  16.63
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     7.50    0.00    2.50     0.00     0.04
>>  32.00     0.00    1.60    0.00    1.60   1.60   0.40
>> xvdf              0.00     0.00  353.50    0.00    24.12     0.00
>> 139.75     0.49    1.37    1.37    0.00   0.58  20.60
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    1.00     0.00     0.00
>> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     2.00  463.50   35.00    30.69     2.86
>> 137.84     0.88    1.77    1.29    8.17   0.60  30.00
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    1.00     0.00     0.00
>> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00   99.50   36.00     8.54     4.40
>> 195.62     1.55    3.88    1.45   10.61   1.06  14.40
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     5.00    0.00    1.50     0.00     0.03
>>  34.67     0.00    1.33    0.00    1.33   1.33   0.20
>> xvdf              0.00     1.50  703.00  195.00    48.83    23.76
>> 165.57     6.49    8.36    1.66   32.51   0.55  49.80
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    1.00     0.00     0.04
>>  72.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     2.50  149.50   69.50    10.12     6.68
>> 157.14     0.74    3.42    1.18    8.23   0.51  11.20
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     5.00    0.00    2.50     0.00     0.03
>>  24.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00   61.50   22.50     5.36     2.75
>> 197.64     0.33    3.93    1.50   10.58   0.88   7.40
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     0.00    0.00    0.50     0.00     0.00
>> 8.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00  375.00    0.00    24.84     0.00
>> 135.64     0.45    1.20    1.20    0.00   0.57  21.20
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     1.00    0.00    6.00     0.00     0.03
>> 9.33     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     0.00  542.50   23.50    35.08     2.83
>> 137.16     0.80    1.41    1.15    7.23   0.49  28.00
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> xvda              0.00     3.50    0.50    1.50     0.00     0.02
>>  24.00     0.00    0.00    0.00    0.00   0.00   0.00
>> xvdf              0.00     1.50  272.00  153.50    16.18    18.67
>> 167.73    14.32   33.66    1.39   90.84   0.81  34.60
>>
>>
>>
>> On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> When you have high system load it means your CPU is waiting for
>>> *something*, and in my experience it's usually slow disk.  A disk connected
>>> over network has been a culprit for me many times.
>>>
>>> On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>>> Can do you do:
>>>>
>>>> iostat -dmx 2 10
>>>>
>>>>
>>>>
>>>> On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> The read being low is because we do not have much read operations
>>>>> right now.
>>>>>
>>>>> The heap is only 4GB.
>>>>>
>>>>> MAX_HEAP_SIZE=4GB
>>>>>
>>>>> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com
>>>>> > wrote:
>>>>>
>>>>>> EBS iops scale with volume size.
>>>>>>
>>>>>>
>>>>>>
>>>>>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting
>>>>>> those on writes, you’re going to suffer on reads.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You have a 16G server, and probably a good chunk of that allocated to
>>>>>> heap. Consequently, you have almost no page cache, so your reads are going
>>>>>> to hit the disk. Your reads being very low is not uncommon if you have no
>>>>>> page cache – the default settings for Cassandra (64k compression chunks)
>>>>>> are really inefficient for small reads served off of disk. If you drop the
>>>>>> compression chunk size (4k, for example), you’ll probably see your read
>>>>>> throughput increase significantly, which will give you more iops for
>>>>>> commitlog, so write throughput likely goes up, too.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Jonathan Haddad <jo...@jonhaddad.com>
>>>>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>>>>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>>> *Subject: *Re: Is my cluster normal?
>>>>>>
>>>>>>
>>>>>>
>>>>>> What's your CPU looking like? If it's low, check your IO with iostat
>>>>>> or dstat. I know some people have used Ebs and say it's fine but ive been
>>>>>> burned too many times.
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Riccardo,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Very low IO-wait. About 0.3%.
>>>>>>
>>>>>> No stolen CPU. It is a casssandra only instance. I did not see any
>>>>>> dropped messages.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>>>>>
>>>>>> Pool Name                    Active   Pending      Completed
>>>>>> Blocked  All time blocked
>>>>>>
>>>>>> MutationStage                     1         1      929509244
>>>>>> 0                 0
>>>>>>
>>>>>> ViewMutationStage                 0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> ReadStage                         4         0        4021570
>>>>>> 0                 0
>>>>>>
>>>>>> RequestResponseStage              0         0      731477999
>>>>>> 0                 0
>>>>>>
>>>>>> ReadRepairStage                   0         0         165603
>>>>>> 0                 0
>>>>>>
>>>>>> CounterMutationStage              0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> MiscStage                         0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> CompactionExecutor                2        55          92022
>>>>>> 0                 0
>>>>>>
>>>>>> MemtableReclaimMemory             0         0           1736
>>>>>> 0                 0
>>>>>>
>>>>>> PendingRangeCalculator            0         0              6
>>>>>> 0                 0
>>>>>>
>>>>>> GossipStage                       0         0         345474
>>>>>> 0                 0
>>>>>>
>>>>>> SecondaryIndexManagement          0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> HintsDispatcher                   0         0              4
>>>>>> 0                 0
>>>>>>
>>>>>> MigrationStage                    0         0             35
>>>>>> 0                 0
>>>>>>
>>>>>> MemtablePostFlush                 0         0           1973
>>>>>> 0                 0
>>>>>>
>>>>>> ValidationExecutor                0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> Sampler                           0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> MemtableFlushWriter               0         0           1736
>>>>>> 0                 0
>>>>>>
>>>>>> InternalResponseStage             0         0           5311
>>>>>> 0                 0
>>>>>>
>>>>>> AntiEntropyStage                  0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> CacheCleanupExecutor              0         0              0
>>>>>> 0                 0
>>>>>>
>>>>>> Native-Transport-Requests       128       128      347508531
>>>>>> 2          15891862
>>>>>>
>>>>>>
>>>>>>
>>>>>> Message type           Dropped
>>>>>>
>>>>>> READ                         0
>>>>>>
>>>>>> RANGE_SLICE                  0
>>>>>>
>>>>>> _TRACE                       0
>>>>>>
>>>>>> HINT                         0
>>>>>>
>>>>>> MUTATION                     0
>>>>>>
>>>>>> COUNTER_MUTATION             0
>>>>>>
>>>>>> BATCH_STORE                  0
>>>>>>
>>>>>> BATCH_REMOVE                 0
>>>>>>
>>>>>> REQUEST_RESPONSE             0
>>>>>>
>>>>>> PAGED_RANGE                  0
>>>>>>
>>>>>> READ_REPAIR                  0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Yuan,
>>>>>>
>>>>>>
>>>>>>
>>>>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!),
>>>>>> aside from any Cassandra specific discussion a system load of 10 on a 4
>>>>>> threads machine is way too much in my opinion. If that is the running
>>>>>> average system load I would look deeper into system details. Is that IO
>>>>>> wait? Is that CPU Stolen? Is that a Cassandra only instance or are there
>>>>>> other processes pushing the load?
>>>>>>
>>>>>> What does your "nodetool tpstats" say? Hoe many dropped messages do
>>>>>> you have?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>> Thanks Ben! For the post, it seems they got a little better but
>>>>>> similar result than i did. Good to know it.
>>>>>>
>>>>>> I am not sure if a little fine tuning of heap memory will help or
>>>>>> not.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <
>>>>>> ben.slater@instaclustr.com> wrote:
>>>>>>
>>>>>> Hi Yuan,
>>>>>>
>>>>>>
>>>>>>
>>>>>> You might find this blog post a useful comparison:
>>>>>>
>>>>>>
>>>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>>>> also some single DC benchmarks of m4.xl
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>>>>>> clusters plus some discussion of how we went about benchmarking.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Ben
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>>
>>>>>> Yes, here is my stress test result:
>>>>>>
>>>>>> Results:
>>>>>>
>>>>>> op rate                   : 12200 [WRITE:12200]
>>>>>>
>>>>>> partition rate            : 12200 [WRITE:12200]
>>>>>>
>>>>>> row rate                  : 12200 [WRITE:12200]
>>>>>>
>>>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>>>
>>>>>> latency median            : 7.1 [WRITE:7.1]
>>>>>>
>>>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>>>
>>>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>>>
>>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>>>
>>>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>>>
>>>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>>>
>>>>>> Total errors              : 0 [WRITE:0]
>>>>>>
>>>>>> total gc count            : 0
>>>>>>
>>>>>> total gc mb               : 0
>>>>>>
>>>>>> total gc time (s)         : 0
>>>>>>
>>>>>> avg gc time(ms)           : NaN
>>>>>>
>>>>>> stdev gc time(ms)         : 0
>>>>>>
>>>>>> Total operation time      : 00:01:21
>>>>>>
>>>>>> END
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>>>
>>>>>> Lots of variables you're leaving out.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Depends on write size, if you're using logged batch or not, what
>>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>>> However, that's all sort of moot for determining "normal" really you need a
>>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would suggest using Cassandra stress as a baseline and go from
>>>>>> there depending on what those numbers say (just pick the defaults).
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>>
>>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>>
>>>>>> yes, it is about 8k writes per node.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>> writes 30k/second is the main thing.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>>> fit into the row cache.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>> 600GB ssd EBS).
>>>>>>
>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>> those normal?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yuan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ————————
>>>>>>
>>>>>> Ben Slater
>>>>>>
>>>>>> Chief Product Officer
>>>>>>
>>>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>>>>
>>>>>> +61 437 929 798
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

$nodetool tpstats

...
Pool Name                               Active   Pending   Completed
Blocked      All time blocked
Native-Transport-Requests       128       128        1420623949         1
      142821509
...



What is this? Is it normal?

On Tue, Jul 12, 2016 at 3:03 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

> Hi Jonathan,
>
> Here is the result:
>
> ubuntu@ip-172-31-44-250:~$ iostat -dmx 2 10
> Linux 3.13.0-74-generic (ip-172-31-44-250) 07/12/2016 _x86_64_ (4 CPU)
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.01     2.13    0.74    1.55     0.01     0.02    27.77
>     0.00    0.74    0.89    0.66   0.43   0.10
> xvdf              0.01     0.58  237.41   52.50    12.90     6.21   135.02
>     2.32    8.01    3.65   27.72   0.57  16.63
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     7.50    0.00    2.50     0.00     0.04    32.00
>     0.00    1.60    0.00    1.60   1.60   0.40
> xvdf              0.00     0.00  353.50    0.00    24.12     0.00   139.75
>     0.49    1.37    1.37    0.00   0.58  20.60
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     0.00    0.00    1.00     0.00     0.00     8.00
>     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     2.00  463.50   35.00    30.69     2.86   137.84
>     0.88    1.77    1.29    8.17   0.60  30.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     0.00    0.00    1.00     0.00     0.00     8.00
>     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     0.00   99.50   36.00     8.54     4.40   195.62
>     1.55    3.88    1.45   10.61   1.06  14.40
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     5.00    0.00    1.50     0.00     0.03    34.67
>     0.00    1.33    0.00    1.33   1.33   0.20
> xvdf              0.00     1.50  703.00  195.00    48.83    23.76   165.57
>     6.49    8.36    1.66   32.51   0.55  49.80
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     0.00    0.00    1.00     0.00     0.04    72.00
>     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     2.50  149.50   69.50    10.12     6.68   157.14
>     0.74    3.42    1.18    8.23   0.51  11.20
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     5.00    0.00    2.50     0.00     0.03    24.00
>     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     0.00   61.50   22.50     5.36     2.75   197.64
>     0.33    3.93    1.50   10.58   0.88   7.40
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     0.00    0.00    0.50     0.00     0.00     8.00
>     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     0.00  375.00    0.00    24.84     0.00   135.64
>     0.45    1.20    1.20    0.00   0.57  21.20
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     1.00    0.00    6.00     0.00     0.03     9.33
>     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     0.00  542.50   23.50    35.08     2.83   137.16
>     0.80    1.41    1.15    7.23   0.49  28.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda              0.00     3.50    0.50    1.50     0.00     0.02    24.00
>     0.00    0.00    0.00    0.00   0.00   0.00
> xvdf              0.00     1.50  272.00  153.50    16.18    18.67   167.73
>    14.32   33.66    1.39   90.84   0.81  34.60
>
>
>
> On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> When you have high system load it means your CPU is waiting for
>> *something*, and in my experience it's usually slow disk.  A disk connected
>> over network has been a culprit for me many times.
>>
>> On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> Can do you do:
>>>
>>> iostat -dmx 2 10
>>>
>>>
>>>
>>> On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com>
>>> wrote:
>>>
>>>> Hi Jeff,
>>>>
>>>> The read being low is because we do not have much read operations right
>>>> now.
>>>>
>>>> The heap is only 4GB.
>>>>
>>>> MAX_HEAP_SIZE=4GB
>>>>
>>>> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <je...@crowdstrike.com>
>>>> wrote:
>>>>
>>>>> EBS iops scale with volume size.
>>>>>
>>>>>
>>>>>
>>>>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting
>>>>> those on writes, you’re going to suffer on reads.
>>>>>
>>>>>
>>>>>
>>>>> You have a 16G server, and probably a good chunk of that allocated to
>>>>> heap. Consequently, you have almost no page cache, so your reads are going
>>>>> to hit the disk. Your reads being very low is not uncommon if you have no
>>>>> page cache – the default settings for Cassandra (64k compression chunks)
>>>>> are really inefficient for small reads served off of disk. If you drop the
>>>>> compression chunk size (4k, for example), you’ll probably see your read
>>>>> throughput increase significantly, which will give you more iops for
>>>>> commitlog, so write throughput likely goes up, too.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From: *Jonathan Haddad <jo...@jonhaddad.com>
>>>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>>>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>>> *Subject: *Re: Is my cluster normal?
>>>>>
>>>>>
>>>>>
>>>>> What's your CPU looking like? If it's low, check your IO with iostat
>>>>> or dstat. I know some people have used Ebs and say it's fine but ive been
>>>>> burned too many times.
>>>>>
>>>>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>> Hi Riccardo,
>>>>>
>>>>>
>>>>>
>>>>> Very low IO-wait. About 0.3%.
>>>>>
>>>>> No stolen CPU. It is a casssandra only instance. I did not see any
>>>>> dropped messages.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>>>>
>>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>>  All time blocked
>>>>>
>>>>> MutationStage                     1         1      929509244         0
>>>>>                 0
>>>>>
>>>>> ViewMutationStage                 0         0              0         0
>>>>>                 0
>>>>>
>>>>> ReadStage                         4         0        4021570         0
>>>>>                 0
>>>>>
>>>>> RequestResponseStage              0         0      731477999         0
>>>>>                 0
>>>>>
>>>>> ReadRepairStage                   0         0         165603         0
>>>>>                 0
>>>>>
>>>>> CounterMutationStage              0         0              0         0
>>>>>                 0
>>>>>
>>>>> MiscStage                         0         0              0         0
>>>>>                 0
>>>>>
>>>>> CompactionExecutor                2        55          92022         0
>>>>>                 0
>>>>>
>>>>> MemtableReclaimMemory             0         0           1736         0
>>>>>                 0
>>>>>
>>>>> PendingRangeCalculator            0         0              6         0
>>>>>                 0
>>>>>
>>>>> GossipStage                       0         0         345474         0
>>>>>                 0
>>>>>
>>>>> SecondaryIndexManagement          0         0              0         0
>>>>>                 0
>>>>>
>>>>> HintsDispatcher                   0         0              4         0
>>>>>                 0
>>>>>
>>>>> MigrationStage                    0         0             35         0
>>>>>                 0
>>>>>
>>>>> MemtablePostFlush                 0         0           1973         0
>>>>>                 0
>>>>>
>>>>> ValidationExecutor                0         0              0         0
>>>>>                 0
>>>>>
>>>>> Sampler                           0         0              0         0
>>>>>                 0
>>>>>
>>>>> MemtableFlushWriter               0         0           1736         0
>>>>>                 0
>>>>>
>>>>> InternalResponseStage             0         0           5311         0
>>>>>                 0
>>>>>
>>>>> AntiEntropyStage                  0         0              0         0
>>>>>                 0
>>>>>
>>>>> CacheCleanupExecutor              0         0              0         0
>>>>>                 0
>>>>>
>>>>> Native-Transport-Requests       128       128      347508531         2
>>>>>          15891862
>>>>>
>>>>>
>>>>>
>>>>> Message type           Dropped
>>>>>
>>>>> READ                         0
>>>>>
>>>>> RANGE_SLICE                  0
>>>>>
>>>>> _TRACE                       0
>>>>>
>>>>> HINT                         0
>>>>>
>>>>> MUTATION                     0
>>>>>
>>>>> COUNTER_MUTATION             0
>>>>>
>>>>> BATCH_STORE                  0
>>>>>
>>>>> BATCH_REMOVE                 0
>>>>>
>>>>> REQUEST_RESPONSE             0
>>>>>
>>>>> PAGED_RANGE                  0
>>>>>
>>>>> READ_REPAIR                  0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Yuan,
>>>>>
>>>>>
>>>>>
>>>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!),
>>>>> aside from any Cassandra specific discussion a system load of 10 on a 4
>>>>> threads machine is way too much in my opinion. If that is the running
>>>>> average system load I would look deeper into system details. Is that IO
>>>>> wait? Is that CPU Stolen? Is that a Cassandra only instance or are there
>>>>> other processes pushing the load?
>>>>>
>>>>> What does your "nodetool tpstats" say? Hoe many dropped messages do
>>>>> you have?
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>> Thanks Ben! For the post, it seems they got a little better but
>>>>> similar result than i did. Good to know it.
>>>>>
>>>>> I am not sure if a little fine tuning of heap memory will help or
>>>>> not.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>>>>> wrote:
>>>>>
>>>>> Hi Yuan,
>>>>>
>>>>>
>>>>>
>>>>> You might find this blog post a useful comparison:
>>>>>
>>>>>
>>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>>>>
>>>>>
>>>>>
>>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>>> also some single DC benchmarks of m4.xl
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>>>>> clusters plus some discussion of how we went about benchmarking.
>>>>>
>>>>>
>>>>>
>>>>> Cheers
>>>>>
>>>>> Ben
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>
>>>>> Yes, here is my stress test result:
>>>>>
>>>>> Results:
>>>>>
>>>>> op rate                   : 12200 [WRITE:12200]
>>>>>
>>>>> partition rate            : 12200 [WRITE:12200]
>>>>>
>>>>> row rate                  : 12200 [WRITE:12200]
>>>>>
>>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>>
>>>>> latency median            : 7.1 [WRITE:7.1]
>>>>>
>>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>>
>>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>>
>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>>
>>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>>
>>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>>
>>>>> Total errors              : 0 [WRITE:0]
>>>>>
>>>>> total gc count            : 0
>>>>>
>>>>> total gc mb               : 0
>>>>>
>>>>> total gc time (s)         : 0
>>>>>
>>>>> avg gc time(ms)           : NaN
>>>>>
>>>>> stdev gc time(ms)         : 0
>>>>>
>>>>> Total operation time      : 00:01:21
>>>>>
>>>>> END
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>>
>>>>> Lots of variables you're leaving out.
>>>>>
>>>>>
>>>>>
>>>>> Depends on write size, if you're using logged batch or not, what
>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>> However, that's all sort of moot for determining "normal" really you need a
>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>
>>>>>
>>>>>
>>>>> I would suggest using Cassandra stress as a baseline and go from there
>>>>> depending on what those numbers say (just pick the defaults).
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>>
>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>
>>>>> yes, it is about 8k writes per node.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>> writes 30k/second is the main thing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>> fit into the row cache.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>> 600GB ssd EBS).
>>>>>
>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>> those normal?
>>>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>>
>>>>> Yuan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ————————
>>>>>
>>>>> Ben Slater
>>>>>
>>>>> Chief Product Officer
>>>>>
>>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>>>
>>>>> +61 437 929 798
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Hi Jonathan,

Here is the result:

ubuntu@ip-172-31-44-250:~$ iostat -dmx 2 10
Linux 3.13.0-74-generic (ip-172-31-44-250) 07/12/2016 _x86_64_ (4 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.01     2.13    0.74    1.55     0.01     0.02    27.77
    0.00    0.74    0.89    0.66   0.43   0.10
xvdf              0.01     0.58  237.41   52.50    12.90     6.21   135.02
    2.32    8.01    3.65   27.72   0.57  16.63

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     7.50    0.00    2.50     0.00     0.04    32.00
    0.00    1.60    0.00    1.60   1.60   0.40
xvdf              0.00     0.00  353.50    0.00    24.12     0.00   139.75
    0.49    1.37    1.37    0.00   0.58  20.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    1.00     0.00     0.00     8.00
    0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     2.00  463.50   35.00    30.69     2.86   137.84
    0.88    1.77    1.29    8.17   0.60  30.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    1.00     0.00     0.00     8.00
    0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     0.00   99.50   36.00     8.54     4.40   195.62
    1.55    3.88    1.45   10.61   1.06  14.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     5.00    0.00    1.50     0.00     0.03    34.67
    0.00    1.33    0.00    1.33   1.33   0.20
xvdf              0.00     1.50  703.00  195.00    48.83    23.76   165.57
    6.49    8.36    1.66   32.51   0.55  49.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    1.00     0.00     0.04    72.00
    0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     2.50  149.50   69.50    10.12     6.68   157.14
    0.74    3.42    1.18    8.23   0.51  11.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     5.00    0.00    2.50     0.00     0.03    24.00
    0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     0.00   61.50   22.50     5.36     2.75   197.64
    0.33    3.93    1.50   10.58   0.88   7.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.50     0.00     0.00     8.00
    0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     0.00  375.00    0.00    24.84     0.00   135.64
    0.45    1.20    1.20    0.00   0.57  21.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     1.00    0.00    6.00     0.00     0.03     9.33
    0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     0.00  542.50   23.50    35.08     2.83   137.16
    0.80    1.41    1.15    7.23   0.49  28.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     3.50    0.50    1.50     0.00     0.02    24.00
    0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00     1.50  272.00  153.50    16.18    18.67   167.73
   14.32   33.66    1.39   90.84   0.81  34.60



On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> When you have high system load it means your CPU is waiting for
> *something*, and in my experience it's usually slow disk.  A disk connected
> over network has been a culprit for me many times.
>
> On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> Can do you do:
>>
>> iostat -dmx 2 10
>>
>>
>>
>> On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>>> Hi Jeff,
>>>
>>> The read being low is because we do not have much read operations right
>>> now.
>>>
>>> The heap is only 4GB.
>>>
>>> MAX_HEAP_SIZE=4GB
>>>
>>> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <je...@crowdstrike.com>
>>> wrote:
>>>
>>>> EBS iops scale with volume size.
>>>>
>>>>
>>>>
>>>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting
>>>> those on writes, you’re going to suffer on reads.
>>>>
>>>>
>>>>
>>>> You have a 16G server, and probably a good chunk of that allocated to
>>>> heap. Consequently, you have almost no page cache, so your reads are going
>>>> to hit the disk. Your reads being very low is not uncommon if you have no
>>>> page cache – the default settings for Cassandra (64k compression chunks)
>>>> are really inefficient for small reads served off of disk. If you drop the
>>>> compression chunk size (4k, for example), you’ll probably see your read
>>>> throughput increase significantly, which will give you more iops for
>>>> commitlog, so write throughput likely goes up, too.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Jonathan Haddad <jo...@jonhaddad.com>
>>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> *Subject: *Re: Is my cluster normal?
>>>>
>>>>
>>>>
>>>> What's your CPU looking like? If it's low, check your IO with iostat or
>>>> dstat. I know some people have used Ebs and say it's fine but ive been
>>>> burned too many times.
>>>>
>>>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>> Hi Riccardo,
>>>>
>>>>
>>>>
>>>> Very low IO-wait. About 0.3%.
>>>>
>>>> No stolen CPU. It is a casssandra only instance. I did not see any
>>>> dropped messages.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>>>
>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>>
>>>> MutationStage                     1         1      929509244         0
>>>>                 0
>>>>
>>>> ViewMutationStage                 0         0              0         0
>>>>                 0
>>>>
>>>> ReadStage                         4         0        4021570         0
>>>>                 0
>>>>
>>>> RequestResponseStage              0         0      731477999         0
>>>>                 0
>>>>
>>>> ReadRepairStage                   0         0         165603         0
>>>>                 0
>>>>
>>>> CounterMutationStage              0         0              0         0
>>>>                 0
>>>>
>>>> MiscStage                         0         0              0         0
>>>>                 0
>>>>
>>>> CompactionExecutor                2        55          92022         0
>>>>                 0
>>>>
>>>> MemtableReclaimMemory             0         0           1736         0
>>>>                 0
>>>>
>>>> PendingRangeCalculator            0         0              6         0
>>>>                 0
>>>>
>>>> GossipStage                       0         0         345474         0
>>>>                 0
>>>>
>>>> SecondaryIndexManagement          0         0              0         0
>>>>                 0
>>>>
>>>> HintsDispatcher                   0         0              4         0
>>>>                 0
>>>>
>>>> MigrationStage                    0         0             35         0
>>>>                 0
>>>>
>>>> MemtablePostFlush                 0         0           1973         0
>>>>                 0
>>>>
>>>> ValidationExecutor                0         0              0         0
>>>>                 0
>>>>
>>>> Sampler                           0         0              0         0
>>>>                 0
>>>>
>>>> MemtableFlushWriter               0         0           1736         0
>>>>                 0
>>>>
>>>> InternalResponseStage             0         0           5311         0
>>>>                 0
>>>>
>>>> AntiEntropyStage                  0         0              0         0
>>>>                 0
>>>>
>>>> CacheCleanupExecutor              0         0              0         0
>>>>                 0
>>>>
>>>> Native-Transport-Requests       128       128      347508531         2
>>>>          15891862
>>>>
>>>>
>>>>
>>>> Message type           Dropped
>>>>
>>>> READ                         0
>>>>
>>>> RANGE_SLICE                  0
>>>>
>>>> _TRACE                       0
>>>>
>>>> HINT                         0
>>>>
>>>> MUTATION                     0
>>>>
>>>> COUNTER_MUTATION             0
>>>>
>>>> BATCH_STORE                  0
>>>>
>>>> BATCH_REMOVE                 0
>>>>
>>>> REQUEST_RESPONSE             0
>>>>
>>>> PAGED_RANGE                  0
>>>>
>>>> READ_REPAIR                  0
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Yuan,
>>>>
>>>>
>>>>
>>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>>>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>>>> machine is way too much in my opinion. If that is the running average
>>>> system load I would look deeper into system details. Is that IO wait? Is
>>>> that CPU Stolen? Is that a Cassandra only instance or are there other
>>>> processes pushing the load?
>>>>
>>>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>>>> have?
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>> Thanks Ben! For the post, it seems they got a little better but similar
>>>> result than i did. Good to know it.
>>>>
>>>> I am not sure if a little fine tuning of heap memory will help or not.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>>>> wrote:
>>>>
>>>> Hi Yuan,
>>>>
>>>>
>>>>
>>>> You might find this blog post a useful comparison:
>>>>
>>>>
>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>>>
>>>>
>>>>
>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>> also some single DC benchmarks of m4.xl
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>>>> clusters plus some discussion of how we went about benchmarking.
>>>>
>>>>
>>>>
>>>> Cheers
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>> Yes, here is my stress test result:
>>>>
>>>> Results:
>>>>
>>>> op rate                   : 12200 [WRITE:12200]
>>>>
>>>> partition rate            : 12200 [WRITE:12200]
>>>>
>>>> row rate                  : 12200 [WRITE:12200]
>>>>
>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>
>>>> latency median            : 7.1 [WRITE:7.1]
>>>>
>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>
>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>
>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>
>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>
>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>
>>>> Total errors              : 0 [WRITE:0]
>>>>
>>>> total gc count            : 0
>>>>
>>>> total gc mb               : 0
>>>>
>>>> total gc time (s)         : 0
>>>>
>>>> avg gc time(ms)           : NaN
>>>>
>>>> stdev gc time(ms)         : 0
>>>>
>>>> Total operation time      : 00:01:21
>>>>
>>>> END
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>
>>>> Lots of variables you're leaving out.
>>>>
>>>>
>>>>
>>>> Depends on write size, if you're using logged batch or not, what
>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>> However, that's all sort of moot for determining "normal" really you need a
>>>> baseline as all those variables end up mattering a huge amount.
>>>>
>>>>
>>>>
>>>> I would suggest using Cassandra stress as a baseline and go from there
>>>> depending on what those numbers say (just pick the defaults).
>>>>
>>>> Sent from my iPhone
>>>>
>>>>
>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>> yes, it is about 8k writes per node.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>> writes 30k/second is the main thing.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>> Assuming you meant 100k, that likely for something with 16mb of storage
>>>> (probably way small) where the data is more that 64k hence will not fit
>>>> into the row cache.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>>>> ssd EBS).
>>>>
>>>> I can reach a cluster wide write requests of 30k/second and read
>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>> those normal?
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>> Yuan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ————————
>>>>
>>>> Ben Slater
>>>>
>>>> Chief Product Officer
>>>>
>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>>
>>>> +61 437 929 798
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>

Re: Is my cluster normal?

Posted by Riccardo Ferrari <fe...@gmail.com>.

While I'm surprised you don't have any dropped message I have to point the
finger against the following tpstats line:

Native-Transport-Requests       128       128      347508531         2
     15891862

where the the first '128' are the active reuests and the second '128' are
the pending ones. Might not be strictly related, however this might be of
interest:

https://issues.apache.org/jira/browse/CASSANDRA-11363

there's a chance that tuning the 'native_transport_*' related options can
mitigate/solve the issue.

Best,

On Tue, Jul 12, 2016 at 9:34 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> When you have high system load it means your CPU is waiting for
> *something*, and in my experience it's usually slow disk.  A disk connected
> over network has been a culprit for me many times.
>
> On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> Can do you do:
>>
>> iostat -dmx 2 10
>>
>>
>>
>> On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>>> Hi Jeff,
>>>
>>> The read being low is because we do not have much read operations right
>>> now.
>>>
>>> The heap is only 4GB.
>>>
>>> MAX_HEAP_SIZE=4GB
>>>
>>> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <je...@crowdstrike.com>
>>> wrote:
>>>
>>>> EBS iops scale with volume size.
>>>>
>>>>
>>>>
>>>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting
>>>> those on writes, you’re going to suffer on reads.
>>>>
>>>>
>>>>
>>>> You have a 16G server, and probably a good chunk of that allocated to
>>>> heap. Consequently, you have almost no page cache, so your reads are going
>>>> to hit the disk. Your reads being very low is not uncommon if you have no
>>>> page cache – the default settings for Cassandra (64k compression chunks)
>>>> are really inefficient for small reads served off of disk. If you drop the
>>>> compression chunk size (4k, for example), you’ll probably see your read
>>>> throughput increase significantly, which will give you more iops for
>>>> commitlog, so write throughput likely goes up, too.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Jonathan Haddad <jo...@jonhaddad.com>
>>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> *Subject: *Re: Is my cluster normal?
>>>>
>>>>
>>>>
>>>> What's your CPU looking like? If it's low, check your IO with iostat or
>>>> dstat. I know some people have used Ebs and say it's fine but ive been
>>>> burned too many times.
>>>>
>>>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>> Hi Riccardo,
>>>>
>>>>
>>>>
>>>> Very low IO-wait. About 0.3%.
>>>>
>>>> No stolen CPU. It is a casssandra only instance. I did not see any
>>>> dropped messages.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>>>
>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>>
>>>> MutationStage                     1         1      929509244         0
>>>>                 0
>>>>
>>>> ViewMutationStage                 0         0              0         0
>>>>                 0
>>>>
>>>> ReadStage                         4         0        4021570         0
>>>>                 0
>>>>
>>>> RequestResponseStage              0         0      731477999         0
>>>>                 0
>>>>
>>>> ReadRepairStage                   0         0         165603         0
>>>>                 0
>>>>
>>>> CounterMutationStage              0         0              0         0
>>>>                 0
>>>>
>>>> MiscStage                         0         0              0         0
>>>>                 0
>>>>
>>>> CompactionExecutor                2        55          92022         0
>>>>                 0
>>>>
>>>> MemtableReclaimMemory             0         0           1736         0
>>>>                 0
>>>>
>>>> PendingRangeCalculator            0         0              6         0
>>>>                 0
>>>>
>>>> GossipStage                       0         0         345474         0
>>>>                 0
>>>>
>>>> SecondaryIndexManagement          0         0              0         0
>>>>                 0
>>>>
>>>> HintsDispatcher                   0         0              4         0
>>>>                 0
>>>>
>>>> MigrationStage                    0         0             35         0
>>>>                 0
>>>>
>>>> MemtablePostFlush                 0         0           1973         0
>>>>                 0
>>>>
>>>> ValidationExecutor                0         0              0         0
>>>>                 0
>>>>
>>>> Sampler                           0         0              0         0
>>>>                 0
>>>>
>>>> MemtableFlushWriter               0         0           1736         0
>>>>                 0
>>>>
>>>> InternalResponseStage             0         0           5311         0
>>>>                 0
>>>>
>>>> AntiEntropyStage                  0         0              0         0
>>>>                 0
>>>>
>>>> CacheCleanupExecutor              0         0              0         0
>>>>                 0
>>>>
>>>> Native-Transport-Requests       128       128      347508531         2
>>>>          15891862
>>>>
>>>>
>>>>
>>>> Message type           Dropped
>>>>
>>>> READ                         0
>>>>
>>>> RANGE_SLICE                  0
>>>>
>>>> _TRACE                       0
>>>>
>>>> HINT                         0
>>>>
>>>> MUTATION                     0
>>>>
>>>> COUNTER_MUTATION             0
>>>>
>>>> BATCH_STORE                  0
>>>>
>>>> BATCH_REMOVE                 0
>>>>
>>>> REQUEST_RESPONSE             0
>>>>
>>>> PAGED_RANGE                  0
>>>>
>>>> READ_REPAIR                  0
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Yuan,
>>>>
>>>>
>>>>
>>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>>>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>>>> machine is way too much in my opinion. If that is the running average
>>>> system load I would look deeper into system details. Is that IO wait? Is
>>>> that CPU Stolen? Is that a Cassandra only instance or are there other
>>>> processes pushing the load?
>>>>
>>>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>>>> have?
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>> Thanks Ben! For the post, it seems they got a little better but similar
>>>> result than i did. Good to know it.
>>>>
>>>> I am not sure if a little fine tuning of heap memory will help or not.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>>>> wrote:
>>>>
>>>> Hi Yuan,
>>>>
>>>>
>>>>
>>>> You might find this blog post a useful comparison:
>>>>
>>>>
>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>>>
>>>>
>>>>
>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>> also some single DC benchmarks of m4.xl
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>>>> clusters plus some discussion of how we went about benchmarking.
>>>>
>>>>
>>>>
>>>> Cheers
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>> Yes, here is my stress test result:
>>>>
>>>> Results:
>>>>
>>>> op rate                   : 12200 [WRITE:12200]
>>>>
>>>> partition rate            : 12200 [WRITE:12200]
>>>>
>>>> row rate                  : 12200 [WRITE:12200]
>>>>
>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>
>>>> latency median            : 7.1 [WRITE:7.1]
>>>>
>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>
>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>
>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>
>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>
>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>
>>>> Total errors              : 0 [WRITE:0]
>>>>
>>>> total gc count            : 0
>>>>
>>>> total gc mb               : 0
>>>>
>>>> total gc time (s)         : 0
>>>>
>>>> avg gc time(ms)           : NaN
>>>>
>>>> stdev gc time(ms)         : 0
>>>>
>>>> Total operation time      : 00:01:21
>>>>
>>>> END
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>
>>>> Lots of variables you're leaving out.
>>>>
>>>>
>>>>
>>>> Depends on write size, if you're using logged batch or not, what
>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>> However, that's all sort of moot for determining "normal" really you need a
>>>> baseline as all those variables end up mattering a huge amount.
>>>>
>>>>
>>>>
>>>> I would suggest using Cassandra stress as a baseline and go from there
>>>> depending on what those numbers say (just pick the defaults).
>>>>
>>>> Sent from my iPhone
>>>>
>>>>
>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>> yes, it is about 8k writes per node.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>> writes 30k/second is the main thing.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>> Assuming you meant 100k, that likely for something with 16mb of storage
>>>> (probably way small) where the data is more that 64k hence will not fit
>>>> into the row cache.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>>>> ssd EBS).
>>>>
>>>> I can reach a cluster wide write requests of 30k/second and read
>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>> those normal?
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>> Yuan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ————————
>>>>
>>>> Ben Slater
>>>>
>>>> Chief Product Officer
>>>>
>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>>
>>>> +61 437 929 798
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>

Re: Is my cluster normal?

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

When you have high system load it means your CPU is waiting for
*something*, and in my experience it's usually slow disk.  A disk connected
over network has been a culprit for me many times.

On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Can do you do:
>
> iostat -dmx 2 10
>
>
>
> On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>> Hi Jeff,
>>
>> The read being low is because we do not have much read operations right
>> now.
>>
>> The heap is only 4GB.
>>
>> MAX_HEAP_SIZE=4GB
>>
>> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <je...@crowdstrike.com>
>> wrote:
>>
>>> EBS iops scale with volume size.
>>>
>>>
>>>
>>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those
>>> on writes, you’re going to suffer on reads.
>>>
>>>
>>>
>>> You have a 16G server, and probably a good chunk of that allocated to
>>> heap. Consequently, you have almost no page cache, so your reads are going
>>> to hit the disk. Your reads being very low is not uncommon if you have no
>>> page cache – the default settings for Cassandra (64k compression chunks)
>>> are really inefficient for small reads served off of disk. If you drop the
>>> compression chunk size (4k, for example), you’ll probably see your read
>>> throughput increase significantly, which will give you more iops for
>>> commitlog, so write throughput likely goes up, too.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *Jonathan Haddad <jo...@jonhaddad.com>
>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> *Subject: *Re: Is my cluster normal?
>>>
>>>
>>>
>>> What's your CPU looking like? If it's low, check your IO with iostat or
>>> dstat. I know some people have used Ebs and say it's fine but ive been
>>> burned too many times.
>>>
>>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>> Hi Riccardo,
>>>
>>>
>>>
>>> Very low IO-wait. About 0.3%.
>>>
>>> No stolen CPU. It is a casssandra only instance. I did not see any
>>> dropped messages.
>>>
>>>
>>>
>>>
>>>
>>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>>
>>> Pool Name                    Active   Pending      Completed   Blocked
>>>  All time blocked
>>>
>>> MutationStage                     1         1      929509244         0
>>>               0
>>>
>>> ViewMutationStage                 0         0              0         0
>>>               0
>>>
>>> ReadStage                         4         0        4021570         0
>>>               0
>>>
>>> RequestResponseStage              0         0      731477999         0
>>>               0
>>>
>>> ReadRepairStage                   0         0         165603         0
>>>               0
>>>
>>> CounterMutationStage              0         0              0         0
>>>               0
>>>
>>> MiscStage                         0         0              0         0
>>>               0
>>>
>>> CompactionExecutor                2        55          92022         0
>>>               0
>>>
>>> MemtableReclaimMemory             0         0           1736         0
>>>               0
>>>
>>> PendingRangeCalculator            0         0              6         0
>>>               0
>>>
>>> GossipStage                       0         0         345474         0
>>>               0
>>>
>>> SecondaryIndexManagement          0         0              0         0
>>>               0
>>>
>>> HintsDispatcher                   0         0              4         0
>>>               0
>>>
>>> MigrationStage                    0         0             35         0
>>>               0
>>>
>>> MemtablePostFlush                 0         0           1973         0
>>>               0
>>>
>>> ValidationExecutor                0         0              0         0
>>>               0
>>>
>>> Sampler                           0         0              0         0
>>>               0
>>>
>>> MemtableFlushWriter               0         0           1736         0
>>>               0
>>>
>>> InternalResponseStage             0         0           5311         0
>>>               0
>>>
>>> AntiEntropyStage                  0         0              0         0
>>>               0
>>>
>>> CacheCleanupExecutor              0         0              0         0
>>>               0
>>>
>>> Native-Transport-Requests       128       128      347508531         2
>>>        15891862
>>>
>>>
>>>
>>> Message type           Dropped
>>>
>>> READ                         0
>>>
>>> RANGE_SLICE                  0
>>>
>>> _TRACE                       0
>>>
>>> HINT                         0
>>>
>>> MUTATION                     0
>>>
>>> COUNTER_MUTATION             0
>>>
>>> BATCH_STORE                  0
>>>
>>> BATCH_REMOVE                 0
>>>
>>> REQUEST_RESPONSE             0
>>>
>>> PAGED_RANGE                  0
>>>
>>> READ_REPAIR                  0
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>>> wrote:
>>>
>>> Hi Yuan,
>>>
>>>
>>>
>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>>> machine is way too much in my opinion. If that is the running average
>>> system load I would look deeper into system details. Is that IO wait? Is
>>> that CPU Stolen? Is that a Cassandra only instance or are there other
>>> processes pushing the load?
>>>
>>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>>> have?
>>>
>>>
>>>
>>> Best,
>>>
>>>
>>>
>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com>
>>> wrote:
>>>
>>> Thanks Ben! For the post, it seems they got a little better but similar
>>> result than i did. Good to know it.
>>>
>>> I am not sure if a little fine tuning of heap memory will help or not.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>>> wrote:
>>>
>>> Hi Yuan,
>>>
>>>
>>>
>>> You might find this blog post a useful comparison:
>>>
>>>
>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>>
>>>
>>>
>>> Although the focus is on Spark and Cassandra and multi-DC there are also
>>> some single DC benchmarks of m4.xl
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>>> clusters plus some discussion of how we went about benchmarking.
>>>
>>>
>>>
>>> Cheers
>>>
>>> Ben
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>> Yes, here is my stress test result:
>>>
>>> Results:
>>>
>>> op rate                   : 12200 [WRITE:12200]
>>>
>>> partition rate            : 12200 [WRITE:12200]
>>>
>>> row rate                  : 12200 [WRITE:12200]
>>>
>>> latency mean              : 16.4 [WRITE:16.4]
>>>
>>> latency median            : 7.1 [WRITE:7.1]
>>>
>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>
>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>
>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>
>>> latency max               : 1408.4 [WRITE:1408.4]
>>>
>>> Total partitions          : 1000000 [WRITE:1000000]
>>>
>>> Total errors              : 0 [WRITE:0]
>>>
>>> total gc count            : 0
>>>
>>> total gc mb               : 0
>>>
>>> total gc time (s)         : 0
>>>
>>> avg gc time(ms)           : NaN
>>>
>>> stdev gc time(ms)         : 0
>>>
>>> Total operation time      : 00:01:21
>>>
>>> END
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>
>>> Lots of variables you're leaving out.
>>>
>>>
>>>
>>> Depends on write size, if you're using logged batch or not, what
>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>> However, that's all sort of moot for determining "normal" really you need a
>>> baseline as all those variables end up mattering a huge amount.
>>>
>>>
>>>
>>> I would suggest using Cassandra stress as a baseline and go from there
>>> depending on what those numbers say (just pick the defaults).
>>>
>>> Sent from my iPhone
>>>
>>>
>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>> yes, it is about 8k writes per node.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>> wrote:
>>>
>>> Are you saying 7k writes per node? or 30k writes per node?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>> writes 30k/second is the main thing.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>> wrote:
>>>
>>> Assuming you meant 100k, that likely for something with 16mb of storage
>>> (probably way small) where the data is more that 64k hence will not fit
>>> into the row cache.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>>
>>>
>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>>> ssd EBS).
>>>
>>> I can reach a cluster wide write requests of 30k/second and read request
>>> about 100/second. The cluster OS load constantly above 10. Are those normal?
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>> Best,
>>>
>>>
>>>
>>> Yuan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> ————————
>>>
>>> Ben Slater
>>>
>>> Chief Product Officer
>>>
>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>
>>> +61 437 929 798
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: Is my cluster normal?

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Can do you do:

iostat -dmx 2 10



On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yu...@kryptoncloud.com> wrote:

> Hi Jeff,
>
> The read being low is because we do not have much read operations right
> now.
>
> The heap is only 4GB.
>
> MAX_HEAP_SIZE=4GB
>
> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <je...@crowdstrike.com>
> wrote:
>
>> EBS iops scale with volume size.
>>
>>
>>
>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those
>> on writes, you’re going to suffer on reads.
>>
>>
>>
>> You have a 16G server, and probably a good chunk of that allocated to
>> heap. Consequently, you have almost no page cache, so your reads are going
>> to hit the disk. Your reads being very low is not uncommon if you have no
>> page cache – the default settings for Cassandra (64k compression chunks)
>> are really inefficient for small reads served off of disk. If you drop the
>> compression chunk size (4k, for example), you’ll probably see your read
>> throughput increase significantly, which will give you more iops for
>> commitlog, so write throughput likely goes up, too.
>>
>>
>>
>>
>>
>>
>>
>> *From: *Jonathan Haddad <jo...@jonhaddad.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *Re: Is my cluster normal?
>>
>>
>>
>> What's your CPU looking like? If it's low, check your IO with iostat or
>> dstat. I know some people have used Ebs and say it's fine but ive been
>> burned too many times.
>>
>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> Hi Riccardo,
>>
>>
>>
>> Very low IO-wait. About 0.3%.
>>
>> No stolen CPU. It is a casssandra only instance. I did not see any
>> dropped messages.
>>
>>
>>
>>
>>
>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>
>> Pool Name                    Active   Pending      Completed   Blocked
>>  All time blocked
>>
>> MutationStage                     1         1      929509244         0
>>               0
>>
>> ViewMutationStage                 0         0              0         0
>>               0
>>
>> ReadStage                         4         0        4021570         0
>>               0
>>
>> RequestResponseStage              0         0      731477999         0
>>               0
>>
>> ReadRepairStage                   0         0         165603         0
>>               0
>>
>> CounterMutationStage              0         0              0         0
>>               0
>>
>> MiscStage                         0         0              0         0
>>               0
>>
>> CompactionExecutor                2        55          92022         0
>>               0
>>
>> MemtableReclaimMemory             0         0           1736         0
>>               0
>>
>> PendingRangeCalculator            0         0              6         0
>>               0
>>
>> GossipStage                       0         0         345474         0
>>               0
>>
>> SecondaryIndexManagement          0         0              0         0
>>               0
>>
>> HintsDispatcher                   0         0              4         0
>>               0
>>
>> MigrationStage                    0         0             35         0
>>               0
>>
>> MemtablePostFlush                 0         0           1973         0
>>               0
>>
>> ValidationExecutor                0         0              0         0
>>               0
>>
>> Sampler                           0         0              0         0
>>               0
>>
>> MemtableFlushWriter               0         0           1736         0
>>               0
>>
>> InternalResponseStage             0         0           5311         0
>>               0
>>
>> AntiEntropyStage                  0         0              0         0
>>               0
>>
>> CacheCleanupExecutor              0         0              0         0
>>               0
>>
>> Native-Transport-Requests       128       128      347508531         2
>>        15891862
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>> wrote:
>>
>> Hi Yuan,
>>
>>
>>
>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>> machine is way too much in my opinion. If that is the running average
>> system load I would look deeper into system details. Is that IO wait? Is
>> that CPU Stolen? Is that a Cassandra only instance or are there other
>> processes pushing the load?
>>
>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>> have?
>>
>>
>>
>> Best,
>>
>>
>>
>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> Thanks Ben! For the post, it seems they got a little better but similar
>> result than i did. Good to know it.
>>
>> I am not sure if a little fine tuning of heap memory will help or not.
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>> wrote:
>>
>> Hi Yuan,
>>
>>
>>
>> You might find this blog post a useful comparison:
>>
>>
>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>
>>
>>
>> Although the focus is on Spark and Cassandra and multi-DC there are also
>> some single DC benchmarks of m4.xl
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>> clusters plus some discussion of how we went about benchmarking.
>>
>>
>>
>> Cheers
>>
>> Ben
>>
>>
>>
>>
>>
>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> Yes, here is my stress test result:
>>
>> Results:
>>
>> op rate                   : 12200 [WRITE:12200]
>>
>> partition rate            : 12200 [WRITE:12200]
>>
>> row rate                  : 12200 [WRITE:12200]
>>
>> latency mean              : 16.4 [WRITE:16.4]
>>
>> latency median            : 7.1 [WRITE:7.1]
>>
>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>
>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>
>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>
>> latency max               : 1408.4 [WRITE:1408.4]
>>
>> Total partitions          : 1000000 [WRITE:1000000]
>>
>> Total errors              : 0 [WRITE:0]
>>
>> total gc count            : 0
>>
>> total gc mb               : 0
>>
>> total gc time (s)         : 0
>>
>> avg gc time(ms)           : NaN
>>
>> stdev gc time(ms)         : 0
>>
>> Total operation time      : 00:01:21
>>
>> END
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>
>> Lots of variables you're leaving out.
>>
>>
>>
>> Depends on write size, if you're using logged batch or not, what
>> consistency level, what RF, if the writes come in bursts, etc, etc.
>> However, that's all sort of moot for determining "normal" really you need a
>> baseline as all those variables end up mattering a huge amount.
>>
>>
>>
>> I would suggest using Cassandra stress as a baseline and go from there
>> depending on what those numbers say (just pick the defaults).
>>
>> Sent from my iPhone
>>
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>> wrote:
>>
>> Are you saying 7k writes per node? or 30k writes per node?
>>
>>
>>
>>
>>
>>
>>
>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> writes 30k/second is the main thing.
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>> wrote:
>>
>> Assuming you meant 100k, that likely for something with 16mb of storage
>> (probably way small) where the data is more that 64k hence will not fit
>> into the row cache.
>>
>>
>>
>>
>>
>>
>>
>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>>
>>
>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>>
>>
>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>> ssd EBS).
>>
>> I can reach a cluster wide write requests of 30k/second and read request
>> about 100/second. The cluster OS load constantly above 10. Are those normal?
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>> Best,
>>
>>
>>
>> Yuan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ————————
>>
>> Ben Slater
>>
>> Chief Product Officer
>>
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>
>> +61 437 929 798
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Hi Jeff,

The read being low is because we do not have much read operations right now.

The heap is only 4GB.

MAX_HEAP_SIZE=4GB

On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <je...@crowdstrike.com>
wrote:

> EBS iops scale with volume size.
>
>
>
> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those
> on writes, you’re going to suffer on reads.
>
>
>
> You have a 16G server, and probably a good chunk of that allocated to
> heap. Consequently, you have almost no page cache, so your reads are going
> to hit the disk. Your reads being very low is not uncommon if you have no
> page cache – the default settings for Cassandra (64k compression chunks)
> are really inefficient for small reads served off of disk. If you drop the
> compression chunk size (4k, for example), you’ll probably see your read
> throughput increase significantly, which will give you more iops for
> commitlog, so write throughput likely goes up, too.
>
>
>
>
>
>
>
> *From: *Jonathan Haddad <jo...@jonhaddad.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Thursday, July 7, 2016 at 6:54 PM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Re: Is my cluster normal?
>
>
>
> What's your CPU looking like? If it's low, check your IO with iostat or
> dstat. I know some people have used Ebs and say it's fine but ive been
> burned too many times.
>
> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:
>
> Hi Riccardo,
>
>
>
> Very low IO-wait. About 0.3%.
>
> No stolen CPU. It is a casssandra only instance. I did not see any dropped
> messages.
>
>
>
>
>
> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
>
> MutationStage                     1         1      929509244         0
>             0
>
> ViewMutationStage                 0         0              0         0
>             0
>
> ReadStage                         4         0        4021570         0
>             0
>
> RequestResponseStage              0         0      731477999         0
>             0
>
> ReadRepairStage                   0         0         165603         0
>             0
>
> CounterMutationStage              0         0              0         0
>             0
>
> MiscStage                         0         0              0         0
>             0
>
> CompactionExecutor                2        55          92022         0
>             0
>
> MemtableReclaimMemory             0         0           1736         0
>             0
>
> PendingRangeCalculator            0         0              6         0
>             0
>
> GossipStage                       0         0         345474         0
>             0
>
> SecondaryIndexManagement          0         0              0         0
>             0
>
> HintsDispatcher                   0         0              4         0
>             0
>
> MigrationStage                    0         0             35         0
>             0
>
> MemtablePostFlush                 0         0           1973         0
>             0
>
> ValidationExecutor                0         0              0         0
>             0
>
> Sampler                           0         0              0         0
>             0
>
> MemtableFlushWriter               0         0           1736         0
>             0
>
> InternalResponseStage             0         0           5311         0
>             0
>
> AntiEntropyStage                  0         0              0         0
>             0
>
> CacheCleanupExecutor              0         0              0         0
>             0
>
> Native-Transport-Requests       128       128      347508531         2
>      15891862
>
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
> wrote:
>
> Hi Yuan,
>
>
>
> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
> from any Cassandra specific discussion a system load of 10 on a 4 threads
> machine is way too much in my opinion. If that is the running average
> system load I would look deeper into system details. Is that IO wait? Is
> that CPU Stolen? Is that a Cassandra only instance or are there other
> processes pushing the load?
>
> What does your "nodetool tpstats" say? Hoe many dropped messages do you
> have?
>
>
>
> Best,
>
>
>
> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
> Thanks Ben! For the post, it seems they got a little better but similar
> result than i did. Good to know it.
>
> I am not sure if a little fine tuning of heap memory will help or not.
>
>
>
>
>
> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
> wrote:
>
> Hi Yuan,
>
>
>
> You might find this blog post a useful comparison:
>
>
> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>
>
>
> Although the focus is on Spark and Cassandra and multi-DC there are also
> some single DC benchmarks of m4.xl
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
> clusters plus some discussion of how we went about benchmarking.
>
>
>
> Cheers
>
> Ben
>
>
>
>
>
> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>
> Yes, here is my stress test result:
>
> Results:
>
> op rate                   : 12200 [WRITE:12200]
>
> partition rate            : 12200 [WRITE:12200]
>
> row rate                  : 12200 [WRITE:12200]
>
> latency mean              : 16.4 [WRITE:16.4]
>
> latency median            : 7.1 [WRITE:7.1]
>
> latency 95th percentile   : 38.1 [WRITE:38.1]
>
> latency 99th percentile   : 204.3 [WRITE:204.3]
>
> latency 99.9th percentile : 465.9 [WRITE:465.9]
>
> latency max               : 1408.4 [WRITE:1408.4]
>
> Total partitions          : 1000000 [WRITE:1000000]
>
> Total errors              : 0 [WRITE:0]
>
> total gc count            : 0
>
> total gc mb               : 0
>
> total gc time (s)         : 0
>
> avg gc time(ms)           : NaN
>
> stdev gc time(ms)         : 0
>
> Total operation time      : 00:01:21
>
> END
>
>
>
> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>
> Lots of variables you're leaving out.
>
>
>
> Depends on write size, if you're using logged batch or not, what
> consistency level, what RF, if the writes come in bursts, etc, etc.
> However, that's all sort of moot for determining "normal" really you need a
> baseline as all those variables end up mattering a huge amount.
>
>
>
> I would suggest using Cassandra stress as a baseline and go from there
> depending on what those numbers say (just pick the defaults).
>
> Sent from my iPhone
>
>
> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
> yes, it is about 8k writes per node.
>
>
>
>
>
>
>
> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
> Are you saying 7k writes per node? or 30k writes per node?
>
>
>
>
>
>
>
> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
>
>
> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
> writes 30k/second is the main thing.
>
>
>
>
>
> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
> Assuming you meant 100k, that likely for something with 16mb of storage
> (probably way small) where the data is more that 64k hence will not fit
> into the row cache.
>
>
>
>
>
>
>
> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
>
>
> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>
>
> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
> ssd EBS).
>
> I can reach a cluster wide write requests of 30k/second and read request
> about 100/second. The cluster OS load constantly above 10. Are those normal?
>
>
>
> Thanks!
>
>
>
>
>
> Best,
>
>
>
> Yuan
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> ————————
>
> Ben Slater
>
> Chief Product Officer
>
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>
> +61 437 929 798
>
>
>
>
>
>
>
>

Re: Is my cluster normal?

Posted by Jeff Jirsa <je...@crowdstrike.com>.

EBS iops scale with volume size.

 

A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those on writes, you’re going to suffer on reads.

 

You have a 16G server, and probably a good chunk of that allocated to heap. Consequently, you have almost no page cache, so your reads are going to hit the disk. Your reads being very low is not uncommon if you have no page cache – the default settings for Cassandra (64k compression chunks) are really inefficient for small reads served off of disk. If you drop the compression chunk size (4k, for example), you’ll probably see your read throughput increase significantly, which will give you more iops for commitlog, so write throughput likely goes up, too.

 

 

 

From: Jonathan Haddad <jo...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Thursday, July 7, 2016 at 6:54 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Is my cluster normal?

 

What's your CPU looking like? If it's low, check your IO with iostat or dstat. I know some people have used Ebs and say it's fine but ive been burned too many times. 

On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:

Hi Riccardo, 

 

Very low IO-wait. About 0.3%.

No stolen CPU. It is a casssandra only instance. I did not see any dropped messages.

 

 

ubuntu@cassandra1:/mnt/data$ nodetool tpstats

Pool Name                    Active   Pending      Completed   Blocked  All time blocked

MutationStage                     1         1      929509244         0                 0

ViewMutationStage                 0         0              0         0                 0

ReadStage                         4         0        4021570         0                 0

RequestResponseStage              0         0      731477999         0                 0

ReadRepairStage                   0         0         165603         0                 0

CounterMutationStage              0         0              0         0                 0

MiscStage                         0         0              0         0                 0

CompactionExecutor                2        55          92022         0                 0

MemtableReclaimMemory             0         0           1736         0                 0

PendingRangeCalculator            0         0              6         0                 0

GossipStage                       0         0         345474         0                 0

SecondaryIndexManagement          0         0              0         0                 0

HintsDispatcher                   0         0              4         0                 0

MigrationStage                    0         0             35         0                 0

MemtablePostFlush                 0         0           1973         0                 0

ValidationExecutor                0         0              0         0                 0

Sampler                           0         0              0         0                 0

MemtableFlushWriter               0         0           1736         0                 0

InternalResponseStage             0         0           5311         0                 0

AntiEntropyStage                  0         0              0         0                 0

CacheCleanupExecutor              0         0              0         0                 0

Native-Transport-Requests       128       128      347508531         2          15891862

 

Message type           Dropped

READ                         0

RANGE_SLICE                  0

_TRACE                       0

HINT                         0

MUTATION                     0

COUNTER_MUTATION             0

BATCH_STORE                  0

BATCH_REMOVE                 0

REQUEST_RESPONSE             0

PAGED_RANGE                  0

READ_REPAIR                  0

 

 

 

 

 

On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com> wrote:

Hi Yuan, 

 

You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside from any Cassandra specific discussion a system load of 10 on a 4 threads machine is way too much in my opinion. If that is the running average system load I would look deeper into system details. Is that IO wait? Is that CPU Stolen? Is that a Cassandra only instance or are there other processes pushing the load?

What does your "nodetool tpstats" say? Hoe many dropped messages do you have?

 

Best,

 

On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:

Thanks Ben! For the post, it seems they got a little better but similar result than i did. Good to know it. 

I am not sure if a little fine tuning of heap memory will help or not.  

 

 

On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com> wrote:

Hi Yuan, 

 

You might find this blog post a useful comparison:

https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/

 

Although the focus is on Spark and Cassandra and multi-DC there are also some single DC benchmarks of m4.xl clusters plus some discussion of how we went about benchmarking.

 

Cheers

Ben

 

 

On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:

Yes, here is my stress test result: 

Results:

op rate                   : 12200 [WRITE:12200]

partition rate            : 12200 [WRITE:12200]

row rate                  : 12200 [WRITE:12200]

latency mean              : 16.4 [WRITE:16.4]

latency median            : 7.1 [WRITE:7.1]

latency 95th percentile   : 38.1 [WRITE:38.1]

latency 99th percentile   : 204.3 [WRITE:204.3]

latency 99.9th percentile : 465.9 [WRITE:465.9]

latency max               : 1408.4 [WRITE:1408.4]

Total partitions          : 1000000 [WRITE:1000000]

Total errors              : 0 [WRITE:0]

total gc count            : 0

total gc mb               : 0

total gc time (s)         : 0

avg gc time(ms)           : NaN

stdev gc time(ms)         : 0

Total operation time      : 00:01:21

END

 

On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:

Lots of variables you're leaving out.

 

Depends on write size, if you're using logged batch or not, what consistency level, what RF, if the writes come in bursts, etc, etc. However, that's all sort of moot for determining "normal" really you need a baseline as all those variables end up mattering a huge amount.

 

I would suggest using Cassandra stress as a baseline and go from there depending on what those numbers say (just pick the defaults).

Sent from my iPhone


On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

yes, it is about 8k writes per node. 

 

 

 

On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com> wrote:

Are you saying 7k writes per node? or 30k writes per node?



.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

 

On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

writes 30k/second is the main thing. 

 

 

On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com> wrote:

Assuming you meant 100k, that likely for something with 16mb of storage (probably way small) where the data is more that 64k hence will not fit into the row cache.



.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

 

On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

 

I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB ssd EBS). 

I can reach a cluster wide write requests of 30k/second and read request about 100/second. The cluster OS load constantly above 10. Are those normal?

 

Thanks!

 

 

Best,

 

Yuan 

 

 

 

 

 

 

-- 

———————— 

Ben Slater 

Chief Product Officer

Instaclustr: Cassandra + Spark - Managed | Consulting | Support

+61 437 929 798

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Hi Jonathan,

The IOs are like below. I am not sure why one node always has a much bigger
KB_read/s than other nodes. It seems not good.


==============
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          54.78   24.48    9.35    0.96    0.08   10.35

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              2.31        14.64        17.95    1415348    1734856
xvdf            252.68     11789.51      6394.15 1139459318  617996710

=============

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.71    6.57    3.96    0.50    0.19   66.07

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.12         3.63        10.59    3993540   11648848
xvdf             68.20       923.51      2526.86 1016095212 2780187819

===============
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.31    8.08    3.70    0.26    0.23   65.42

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.07         2.87        10.89    3153996   11976704
xvdf             34.48       498.21      2293.70  547844196 2522227746

================
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.75    8.13    3.82    0.36    0.21   64.73

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.10         3.20        11.33    3515752   12442344
xvdf             44.45       474.30      2511.71  520758840 2757732583






On Thu, Jul 7, 2016 at 6:54 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> What's your CPU looking like? If it's low, check your IO with iostat or
> dstat. I know some people have used Ebs and say it's fine but ive been
> burned too many times.
> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>> Hi Riccardo,
>>
>> Very low IO-wait. About 0.3%.
>> No stolen CPU. It is a casssandra only instance. I did not see any
>> dropped messages.
>>
>>
>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>> Pool Name                    Active   Pending      Completed   Blocked
>>  All time blocked
>> MutationStage                     1         1      929509244         0
>>               0
>> ViewMutationStage                 0         0              0         0
>>               0
>> ReadStage                         4         0        4021570         0
>>               0
>> RequestResponseStage              0         0      731477999         0
>>               0
>> ReadRepairStage                   0         0         165603         0
>>               0
>> CounterMutationStage              0         0              0         0
>>               0
>> MiscStage                         0         0              0         0
>>               0
>> CompactionExecutor                2        55          92022         0
>>               0
>> MemtableReclaimMemory             0         0           1736         0
>>               0
>> PendingRangeCalculator            0         0              6         0
>>               0
>> GossipStage                       0         0         345474         0
>>               0
>> SecondaryIndexManagement          0         0              0         0
>>               0
>> HintsDispatcher                   0         0              4         0
>>               0
>> MigrationStage                    0         0             35         0
>>               0
>> MemtablePostFlush                 0         0           1973         0
>>               0
>> ValidationExecutor                0         0              0         0
>>               0
>> Sampler                           0         0              0         0
>>               0
>> MemtableFlushWriter               0         0           1736         0
>>               0
>> InternalResponseStage             0         0           5311         0
>>               0
>> AntiEntropyStage                  0         0              0         0
>>               0
>> CacheCleanupExecutor              0         0              0         0
>>               0
>> Native-Transport-Requests       128       128      347508531         2
>>        15891862
>>
>> Message type           Dropped
>> READ                         0
>> RANGE_SLICE                  0
>> _TRACE                       0
>> HINT                         0
>> MUTATION                     0
>> COUNTER_MUTATION             0
>> BATCH_STORE                  0
>> BATCH_REMOVE                 0
>> REQUEST_RESPONSE             0
>> PAGED_RANGE                  0
>> READ_REPAIR                  0
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
>> wrote:
>>
>>> Hi Yuan,
>>>
>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>>> machine is way too much in my opinion. If that is the running average
>>> system load I would look deeper into system details. Is that IO wait? Is
>>> that CPU Stolen? Is that a Cassandra only instance or are there other
>>> processes pushing the load?
>>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>>> have?
>>>
>>> Best,
>>>
>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com>
>>> wrote:
>>>
>>>> Thanks Ben! For the post, it seems they got a little better but similar
>>>> result than i did. Good to know it.
>>>> I am not sure if a little fine tuning of heap memory will help or not.
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>>>> wrote:
>>>>
>>>>> Hi Yuan,
>>>>>
>>>>> You might find this blog post a useful comparison:
>>>>>
>>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>>>
>>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>>> also some single DC benchmarks of m4.xl clusters plus some discussion of
>>>>> how we went about benchmarking.
>>>>>
>>>>> Cheers
>>>>> Ben
>>>>>
>>>>>
>>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>
>>>>>> Yes, here is my stress test result:
>>>>>> Results:
>>>>>> op rate                   : 12200 [WRITE:12200]
>>>>>> partition rate            : 12200 [WRITE:12200]
>>>>>> row rate                  : 12200 [WRITE:12200]
>>>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>>> latency median            : 7.1 [WRITE:7.1]
>>>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>>> Total errors              : 0 [WRITE:0]
>>>>>> total gc count            : 0
>>>>>> total gc mb               : 0
>>>>>> total gc time (s)         : 0
>>>>>> avg gc time(ms)           : NaN
>>>>>> stdev gc time(ms)         : 0
>>>>>> Total operation time      : 00:01:21
>>>>>> END
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>>>
>>>>>>> Lots of variables you're leaving out.
>>>>>>>
>>>>>>> Depends on write size, if you're using logged batch or not, what
>>>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>>>> However, that's all sort of moot for determining "normal" really you need a
>>>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>>>
>>>>>>> I would suggest using Cassandra stress as a baseline and go from
>>>>>>> there depending on what those numbers say (just pick the defaults).
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>>>
>>>>>>> yes, it is about 8k writes per node.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <
>>>>>>> daemeonr@gmail.com> wrote:
>>>>>>>
>>>>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>>>>
>>>>>>>>
>>>>>>>> *.......*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>>
>>>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> writes 30k/second is the main thing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <
>>>>>>>>> daemeonr@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>>>>>>> fit into the row cache.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *.......*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory
>>>>>>>>>>> and 600GB ssd EBS).
>>>>>>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>>>>>>> those normal?
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Yuan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>> ————————
>>>>> Ben Slater
>>>>> Chief Product Officer
>>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>>> +61 437 929 798
>>>>>
>>>>
>>>>
>>>
>>

Re: Is my cluster normal?

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

What's your CPU looking like? If it's low, check your IO with iostat or
dstat. I know some people have used Ebs and say it's fine but ive been
burned too many times.
On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yu...@kryptoncloud.com> wrote:

> Hi Riccardo,
>
> Very low IO-wait. About 0.3%.
> No stolen CPU. It is a casssandra only instance. I did not see any dropped
> messages.
>
>
> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> MutationStage                     1         1      929509244         0
>             0
> ViewMutationStage                 0         0              0         0
>             0
> ReadStage                         4         0        4021570         0
>             0
> RequestResponseStage              0         0      731477999         0
>             0
> ReadRepairStage                   0         0         165603         0
>             0
> CounterMutationStage              0         0              0         0
>             0
> MiscStage                         0         0              0         0
>             0
> CompactionExecutor                2        55          92022         0
>             0
> MemtableReclaimMemory             0         0           1736         0
>             0
> PendingRangeCalculator            0         0              6         0
>             0
> GossipStage                       0         0         345474         0
>             0
> SecondaryIndexManagement          0         0              0         0
>             0
> HintsDispatcher                   0         0              4         0
>             0
> MigrationStage                    0         0             35         0
>             0
> MemtablePostFlush                 0         0           1973         0
>             0
> ValidationExecutor                0         0              0         0
>             0
> Sampler                           0         0              0         0
>             0
> MemtableFlushWriter               0         0           1736         0
>             0
> InternalResponseStage             0         0           5311         0
>             0
> AntiEntropyStage                  0         0              0         0
>             0
> CacheCleanupExecutor              0         0              0         0
>             0
> Native-Transport-Requests       128       128      347508531         2
>      15891862
>
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> HINT                         0
> MUTATION                     0
> COUNTER_MUTATION             0
> BATCH_STORE                  0
> BATCH_REMOVE                 0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
>
>
>
>
>
> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com>
> wrote:
>
>> Hi Yuan,
>>
>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>> machine is way too much in my opinion. If that is the running average
>> system load I would look deeper into system details. Is that IO wait? Is
>> that CPU Stolen? Is that a Cassandra only instance or are there other
>> processes pushing the load?
>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>> have?
>>
>> Best,
>>
>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>>> Thanks Ben! For the post, it seems they got a little better but similar
>>> result than i did. Good to know it.
>>> I am not sure if a little fine tuning of heap memory will help or not.
>>>
>>>
>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>>> wrote:
>>>
>>>> Hi Yuan,
>>>>
>>>> You might find this blog post a useful comparison:
>>>>
>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>>
>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>> also some single DC benchmarks of m4.xl clusters plus some discussion of
>>>> how we went about benchmarking.
>>>>
>>>> Cheers
>>>> Ben
>>>>
>>>>
>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>>> Yes, here is my stress test result:
>>>>> Results:
>>>>> op rate                   : 12200 [WRITE:12200]
>>>>> partition rate            : 12200 [WRITE:12200]
>>>>> row rate                  : 12200 [WRITE:12200]
>>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>> latency median            : 7.1 [WRITE:7.1]
>>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>> Total errors              : 0 [WRITE:0]
>>>>> total gc count            : 0
>>>>> total gc mb               : 0
>>>>> total gc time (s)         : 0
>>>>> avg gc time(ms)           : NaN
>>>>> stdev gc time(ms)         : 0
>>>>> Total operation time      : 00:01:21
>>>>> END
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>>
>>>>>> Lots of variables you're leaving out.
>>>>>>
>>>>>> Depends on write size, if you're using logged batch or not, what
>>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>>> However, that's all sort of moot for determining "normal" really you need a
>>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>>
>>>>>> I would suggest using Cassandra stress as a baseline and go from
>>>>>> there depending on what those numbers say (just pick the defaults).
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>>
>>>>>> yes, it is about 8k writes per node.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <daemeonr@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>>>
>>>>>>>
>>>>>>> *.......*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>
>>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> writes 30k/second is the main thing.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <
>>>>>>>> daemeonr@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>>>>>> fit into the row cache.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *.......*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>>>
>>>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>>>>>> 600GB ssd EBS).
>>>>>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>>>>>> those normal?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Yuan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>> ————————
>>>> Ben Slater
>>>> Chief Product Officer
>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>> +61 437 929 798
>>>>
>>>
>>>
>>
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Hi Riccardo,

Very low IO-wait. About 0.3%.
No stolen CPU. It is a casssandra only instance. I did not see any dropped
messages.


ubuntu@cassandra1:/mnt/data$ nodetool tpstats
Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
MutationStage                     1         1      929509244         0
            0
ViewMutationStage                 0         0              0         0
            0
ReadStage                         4         0        4021570         0
            0
RequestResponseStage              0         0      731477999         0
            0
ReadRepairStage                   0         0         165603         0
            0
CounterMutationStage              0         0              0         0
            0
MiscStage                         0         0              0         0
            0
CompactionExecutor                2        55          92022         0
            0
MemtableReclaimMemory             0         0           1736         0
            0
PendingRangeCalculator            0         0              6         0
            0
GossipStage                       0         0         345474         0
            0
SecondaryIndexManagement          0         0              0         0
            0
HintsDispatcher                   0         0              4         0
            0
MigrationStage                    0         0             35         0
            0
MemtablePostFlush                 0         0           1973         0
            0
ValidationExecutor                0         0              0         0
            0
Sampler                           0         0              0         0
            0
MemtableFlushWriter               0         0           1736         0
            0
InternalResponseStage             0         0           5311         0
            0
AntiEntropyStage                  0         0              0         0
            0
CacheCleanupExecutor              0         0              0         0
            0
Native-Transport-Requests       128       128      347508531         2
     15891862

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0





On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <fe...@gmail.com> wrote:

> Hi Yuan,
>
> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
> from any Cassandra specific discussion a system load of 10 on a 4 threads
> machine is way too much in my opinion. If that is the running average
> system load I would look deeper into system details. Is that IO wait? Is
> that CPU Stolen? Is that a Cassandra only instance or are there other
> processes pushing the load?
> What does your "nodetool tpstats" say? Hoe many dropped messages do you
> have?
>
> Best,
>
> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>> Thanks Ben! For the post, it seems they got a little better but similar
>> result than i did. Good to know it.
>> I am not sure if a little fine tuning of heap memory will help or not.
>>
>>
>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
>> wrote:
>>
>>> Hi Yuan,
>>>
>>> You might find this blog post a useful comparison:
>>>
>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>
>>> Although the focus is on Spark and Cassandra and multi-DC there are also
>>> some single DC benchmarks of m4.xl clusters plus some discussion of how we
>>> went about benchmarking.
>>>
>>> Cheers
>>> Ben
>>>
>>>
>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>>> Yes, here is my stress test result:
>>>> Results:
>>>> op rate                   : 12200 [WRITE:12200]
>>>> partition rate            : 12200 [WRITE:12200]
>>>> row rate                  : 12200 [WRITE:12200]
>>>> latency mean              : 16.4 [WRITE:16.4]
>>>> latency median            : 7.1 [WRITE:7.1]
>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>> Total errors              : 0 [WRITE:0]
>>>> total gc count            : 0
>>>> total gc mb               : 0
>>>> total gc time (s)         : 0
>>>> avg gc time(ms)           : NaN
>>>> stdev gc time(ms)         : 0
>>>> Total operation time      : 00:01:21
>>>> END
>>>>
>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>>
>>>>> Lots of variables you're leaving out.
>>>>>
>>>>> Depends on write size, if you're using logged batch or not, what
>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>> However, that's all sort of moot for determining "normal" really you need a
>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>
>>>>> I would suggest using Cassandra stress as a baseline and go from there
>>>>> depending on what those numbers say (just pick the defaults).
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>
>>>>> yes, it is about 8k writes per node.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>>
>>>>>>
>>>>>> *.......*
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>>> writes 30k/second is the main thing.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <
>>>>>>> daemeonr@gmail.com> wrote:
>>>>>>>
>>>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>>>>> fit into the row cache.
>>>>>>>>
>>>>>>>>
>>>>>>>> *.......*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>>
>>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>>>>> 600GB ssd EBS).
>>>>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>>>>> those normal?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Yuan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> --
>>> ————————
>>> Ben Slater
>>> Chief Product Officer
>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>> +61 437 929 798
>>>
>>
>>
>

Re: Is my cluster normal?

Posted by Riccardo Ferrari <fe...@gmail.com>.

Hi Yuan,

You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
from any Cassandra specific discussion a system load of 10 on a 4 threads
machine is way too much in my opinion. If that is the running average
system load I would look deeper into system details. Is that IO wait? Is
that CPU Stolen? Is that a Cassandra only instance or are there other
processes pushing the load?
What does your "nodetool tpstats" say? Hoe many dropped messages do you
have?

Best,

On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yu...@kryptoncloud.com> wrote:

> Thanks Ben! For the post, it seems they got a little better but similar
> result than i did. Good to know it.
> I am not sure if a little fine tuning of heap memory will help or not.
>
>
> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
> wrote:
>
>> Hi Yuan,
>>
>> You might find this blog post a useful comparison:
>>
>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>
>> Although the focus is on Spark and Cassandra and multi-DC there are also
>> some single DC benchmarks of m4.xl clusters plus some discussion of how we
>> went about benchmarking.
>>
>> Cheers
>> Ben
>>
>>
>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>>> Yes, here is my stress test result:
>>> Results:
>>> op rate                   : 12200 [WRITE:12200]
>>> partition rate            : 12200 [WRITE:12200]
>>> row rate                  : 12200 [WRITE:12200]
>>> latency mean              : 16.4 [WRITE:16.4]
>>> latency median            : 7.1 [WRITE:7.1]
>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>> latency max               : 1408.4 [WRITE:1408.4]
>>> Total partitions          : 1000000 [WRITE:1000000]
>>> Total errors              : 0 [WRITE:0]
>>> total gc count            : 0
>>> total gc mb               : 0
>>> total gc time (s)         : 0
>>> avg gc time(ms)           : NaN
>>> stdev gc time(ms)         : 0
>>> Total operation time      : 00:01:21
>>> END
>>>
>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>>
>>>> Lots of variables you're leaving out.
>>>>
>>>> Depends on write size, if you're using logged batch or not, what
>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>> However, that's all sort of moot for determining "normal" really you need a
>>>> baseline as all those variables end up mattering a huge amount.
>>>>
>>>> I would suggest using Cassandra stress as a baseline and go from there
>>>> depending on what those numbers say (just pick the defaults).
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>
>>>> yes, it is about 8k writes per node.
>>>>
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>
>>>>>
>>>>> *.......*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>>> writes 30k/second is the main thing.
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <daemeonr@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>>>> fit into the row cache.
>>>>>>>
>>>>>>>
>>>>>>> *.......*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>
>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>>>> 600GB ssd EBS).
>>>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>>>> those normal?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Yuan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Thanks Ben! For the post, it seems they got a little better but similar
result than i did. Good to know it.
I am not sure if a little fine tuning of heap memory will help or not.

On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <be...@instaclustr.com>
wrote:

> Hi Yuan,
>
> You might find this blog post a useful comparison:
>
> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>
> Although the focus is on Spark and Cassandra and multi-DC there are also
> some single DC benchmarks of m4.xl clusters plus some discussion of how we
> went about benchmarking.
>
> Cheers
> Ben
>
>
> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>> Yes, here is my stress test result:
>> Results:
>> op rate                   : 12200 [WRITE:12200]
>> partition rate            : 12200 [WRITE:12200]
>> row rate                  : 12200 [WRITE:12200]
>> latency mean              : 16.4 [WRITE:16.4]
>> latency median            : 7.1 [WRITE:7.1]
>> latency 95th percentile   : 38.1 [WRITE:38.1]
>> latency 99th percentile   : 204.3 [WRITE:204.3]
>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>> latency max               : 1408.4 [WRITE:1408.4]
>> Total partitions          : 1000000 [WRITE:1000000]
>> Total errors              : 0 [WRITE:0]
>> total gc count            : 0
>> total gc mb               : 0
>> total gc time (s)         : 0
>> avg gc time(ms)           : NaN
>> stdev gc time(ms)         : 0
>> Total operation time      : 00:01:21
>> END
>>
>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>>
>>> Lots of variables you're leaving out.
>>>
>>> Depends on write size, if you're using logged batch or not, what
>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>> However, that's all sort of moot for determining "normal" really you need a
>>> baseline as all those variables end up mattering a huge amount.
>>>
>>> I would suggest using Cassandra stress as a baseline and go from there
>>> depending on what those numbers say (just pick the defaults).
>>>
>>> Sent from my iPhone
>>>
>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>> yes, it is about 8k writes per node.
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>>> wrote:
>>>
>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>
>>>>
>>>> *.......*
>>>>
>>>>
>>>>
>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>>> writes 30k/second is the main thing.
>>>>>
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>>> fit into the row cache.
>>>>>>
>>>>>>
>>>>>> *.......*
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>>> 600GB ssd EBS).
>>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>>> those normal?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Yuan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>

Re: Is my cluster normal?

Posted by Ben Slater <be...@instaclustr.com>.

Hi Yuan,

You might find this blog post a useful comparison:
https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/

Although the focus is on Spark and Cassandra and multi-DC there are also
some single DC benchmarks of m4.xl clusters plus some discussion of how we
went about benchmarking.

Cheers
Ben


On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yu...@kryptoncloud.com> wrote:

> Yes, here is my stress test result:
> Results:
> op rate                   : 12200 [WRITE:12200]
> partition rate            : 12200 [WRITE:12200]
> row rate                  : 12200 [WRITE:12200]
> latency mean              : 16.4 [WRITE:16.4]
> latency median            : 7.1 [WRITE:7.1]
> latency 95th percentile   : 38.1 [WRITE:38.1]
> latency 99th percentile   : 204.3 [WRITE:204.3]
> latency 99.9th percentile : 465.9 [WRITE:465.9]
> latency max               : 1408.4 [WRITE:1408.4]
> Total partitions          : 1000000 [WRITE:1000000]
> Total errors              : 0 [WRITE:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:01:21
> END
>
> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>
>> Lots of variables you're leaving out.
>>
>> Depends on write size, if you're using logged batch or not, what
>> consistency level, what RF, if the writes come in bursts, etc, etc.
>> However, that's all sort of moot for determining "normal" really you need a
>> baseline as all those variables end up mattering a huge amount.
>>
>> I would suggest using Cassandra stress as a baseline and go from there
>> depending on what those numbers say (just pick the defaults).
>>
>> Sent from my iPhone
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>> wrote:
>>
>>> Are you saying 7k writes per node? or 30k writes per node?
>>>
>>>
>>> *.......*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>>> writes 30k/second is the main thing.
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>> fit into the row cache.
>>>>>
>>>>>
>>>>> *.......*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>> 600GB ssd EBS).
>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>> those normal?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Yuan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --
————————
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Hi Ryan,


The version of cassandra is 3.0.6 and
java version "1.8.0_91"

Yuan

On Thu, Jul 7, 2016 at 3:11 PM, Ryan Svihla <rs...@foundev.pro> wrote:

> what version of cassandra and java?
>
> Regards,
>
> Ryan Svihla
>
> On Jul 7, 2016, at 4:51 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
> Yes, here is my stress test result:
> Results:
> op rate                   : 12200 [WRITE:12200]
> partition rate            : 12200 [WRITE:12200]
> row rate                  : 12200 [WRITE:12200]
> latency mean              : 16.4 [WRITE:16.4]
> latency median            : 7.1 [WRITE:7.1]
> latency 95th percentile   : 38.1 [WRITE:38.1]
> latency 99th percentile   : 204.3 [WRITE:204.3]
> latency 99.9th percentile : 465.9 [WRITE:465.9]
> latency max               : 1408.4 [WRITE:1408.4]
> Total partitions          : 1000000 [WRITE:1000000]
> Total errors              : 0 [WRITE:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:01:21
> END
>
> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>
>> Lots of variables you're leaving out.
>>
>> Depends on write size, if you're using logged batch or not, what
>> consistency level, what RF, if the writes come in bursts, etc, etc.
>> However, that's all sort of moot for determining "normal" really you need a
>> baseline as all those variables end up mattering a huge amount.
>>
>> I would suggest using Cassandra stress as a baseline and go from there
>> depending on what those numbers say (just pick the defaults).
>>
>> Sent from my iPhone
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
>> wrote:
>>
>>> Are you saying 7k writes per node? or 30k writes per node?
>>>
>>>
>>> *.......*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>>> writes 30k/second is the main thing.
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>> storage (probably way small) where the data is more that 64k hence will not
>>>>> fit into the row cache.
>>>>>
>>>>>
>>>>> *.......*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>> 600GB ssd EBS).
>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>> those normal?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Yuan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Is my cluster normal?

Posted by Ryan Svihla <rs...@foundev.pro>.

what version of cassandra and java?

Regards,

Ryan Svihla

> On Jul 7, 2016, at 4:51 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
> 
> Yes, here is my stress test result:
> Results:
> op rate                   : 12200 [WRITE:12200]
> partition rate            : 12200 [WRITE:12200]
> row rate                  : 12200 [WRITE:12200]
> latency mean              : 16.4 [WRITE:16.4]
> latency median            : 7.1 [WRITE:7.1]
> latency 95th percentile   : 38.1 [WRITE:38.1]
> latency 99th percentile   : 204.3 [WRITE:204.3]
> latency 99.9th percentile : 465.9 [WRITE:465.9]
> latency max               : 1408.4 [WRITE:1408.4]
> Total partitions          : 1000000 [WRITE:1000000]
> Total errors              : 0 [WRITE:0]
> total gc count            : 0
> total gc mb               : 0
> total gc time (s)         : 0
> avg gc time(ms)           : NaN
> stdev gc time(ms)         : 0
> Total operation time      : 00:01:21
> END
> 
>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>> Lots of variables you're leaving out.
>> 
>> Depends on write size, if you're using logged batch or not, what consistency level, what RF, if the writes come in bursts, etc, etc. However, that's all sort of moot for determining "normal" really you need a baseline as all those variables end up mattering a huge amount.
>> 
>> I would suggest using Cassandra stress as a baseline and go from there depending on what those numbers say (just pick the defaults).
>> 
>> Sent from my iPhone
>> 
>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>> 
>>> yes, it is about 8k writes per node.
>>> 
>>> 
>>> 
>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com> wrote:
>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>> 
>>>> 
>>>> .......
>>>> 
>>>> Daemeon C.M. Reiydelle
>>>> USA (+1) 415.501.0198
>>>> London (+44) (0) 20 8144 9872
>>>> 
>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>> writes 30k/second is the main thing.
>>>>> 
>>>>> 
>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com> wrote:
>>>>>> Assuming you meant 100k, that likely for something with 16mb of storage (probably way small) where the data is more that 64k hence will not fit into the row cache.
>>>>>> 
>>>>>> 
>>>>>> .......
>>>>>> 
>>>>>> Daemeon C.M. Reiydelle
>>>>>> USA (+1) 415.501.0198
>>>>>> London (+44) (0) 20 8144 9872
>>>>>> 
>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB ssd EBS).
>>>>>>> I can reach a cluster wide write requests of 30k/second and read request about 100/second. The cluster OS load constantly above 10. Are those normal?
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Yuan 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

Yes, here is my stress test result:
Results:
op rate                   : 12200 [WRITE:12200]
partition rate            : 12200 [WRITE:12200]
row rate                  : 12200 [WRITE:12200]
latency mean              : 16.4 [WRITE:16.4]
latency median            : 7.1 [WRITE:7.1]
latency 95th percentile   : 38.1 [WRITE:38.1]
latency 99th percentile   : 204.3 [WRITE:204.3]
latency 99.9th percentile : 465.9 [WRITE:465.9]
latency max               : 1408.4 [WRITE:1408.4]
Total partitions          : 1000000 [WRITE:1000000]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:01:21
END

On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs...@foundev.pro> wrote:

> Lots of variables you're leaving out.
>
> Depends on write size, if you're using logged batch or not, what
> consistency level, what RF, if the writes come in bursts, etc, etc.
> However, that's all sort of moot for determining "normal" really you need a
> baseline as all those variables end up mattering a huge amount.
>
> I would suggest using Cassandra stress as a baseline and go from there
> depending on what those numbers say (just pick the defaults).
>
> Sent from my iPhone
>
> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
> yes, it is about 8k writes per node.
>
>
>
> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> Are you saying 7k writes per node? or 30k writes per node?
>>
>>
>> *.......*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>>> writes 30k/second is the main thing.
>>>
>>>
>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>>> wrote:
>>>
>>>> Assuming you meant 100k, that likely for something with 16mb of storage
>>>> (probably way small) where the data is more that 64k hence will not fit
>>>> into the row cache.
>>>>
>>>>
>>>> *.......*
>>>>
>>>>
>>>>
>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>
>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>> 600GB ssd EBS).
>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>> those normal?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Yuan
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Is my cluster normal?

Posted by Ryan Svihla <rs...@foundev.pro>.

Lots of variables you're leaving out.

Depends on write size, if you're using logged batch or not, what consistency level, what RF, if the writes come in bursts, etc, etc. However, that's all sort of moot for determining "normal" really you need a baseline as all those variables end up mattering a huge amount.

I would suggest using Cassandra stress as a baseline and go from there depending on what those numbers say (just pick the defaults).

Sent from my iPhone

> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
> 
> yes, it is about 8k writes per node.
> 
> 
> 
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com> wrote:
>> Are you saying 7k writes per node? or 30k writes per node?
>> 
>> 
>> .......
>> 
>> Daemeon C.M. Reiydelle
>> USA (+1) 415.501.0198
>> London (+44) (0) 20 8144 9872
>> 
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>> writes 30k/second is the main thing.
>>> 
>>> 
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com> wrote:
>>>> Assuming you meant 100k, that likely for something with 16mb of storage (probably way small) where the data is more that 64k hence will not fit into the row cache.
>>>> 
>>>> 
>>>> .......
>>>> 
>>>> Daemeon C.M. Reiydelle
>>>> USA (+1) 415.501.0198
>>>> London (+44) (0) 20 8144 9872
>>>> 
>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>>> 
>>>>> 
>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB ssd EBS).
>>>>> I can reach a cluster wide write requests of 30k/second and read request about 100/second. The cluster OS load constantly above 10. Are those normal?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Yuan 
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

yes, it is about 8k writes per node.



On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> Are you saying 7k writes per node? or 30k writes per node?
>
>
> *.......*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>> writes 30k/second is the main thing.
>>
>>
>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
>> wrote:
>>
>>> Assuming you meant 100k, that likely for something with 16mb of storage
>>> (probably way small) where the data is more that 64k hence will not fit
>>> into the row cache.
>>>
>>>
>>> *.......*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>>
>>>>
>>>>
>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>>>> ssd EBS).
>>>> I can reach a cluster wide write requests of 30k/second and read
>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>> those normal?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> Best,
>>>>
>>>> Yuan
>>>>
>>>>
>>>
>>
>

Re: Is my cluster normal?

Posted by daemeon reiydelle <da...@gmail.com>.

Are you saying 7k writes per node? or 30k writes per node?


*.......*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

> writes 30k/second is the main thing.
>
>
> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> Assuming you meant 100k, that likely for something with 16mb of storage
>> (probably way small) where the data is more that 64k hence will not fit
>> into the row cache.
>>
>>
>> *.......*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>>
>>>
>>>
>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>>> ssd EBS).
>>> I can reach a cluster wide write requests of 30k/second and read request
>>> about 100/second. The cluster OS load constantly above 10. Are those normal?
>>>
>>> Thanks!
>>>
>>>
>>> Best,
>>>
>>> Yuan
>>>
>>>
>>
>

Re: Is my cluster normal?

Posted by Yuan Fang <yu...@kryptoncloud.com>.

writes 30k/second is the main thing.


On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> Assuming you meant 100k, that likely for something with 16mb of storage
> (probably way small) where the data is more that 64k hence will not fit
> into the row cache.
>
>
> *.......*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:
>
>>
>>
>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>> ssd EBS).
>> I can reach a cluster wide write requests of 30k/second and read request
>> about 100/second. The cluster OS load constantly above 10. Are those normal?
>>
>> Thanks!
>>
>>
>> Best,
>>
>> Yuan
>>
>>
>

Re: Is my cluster normal?

Posted by daemeon reiydelle <da...@gmail.com>.

Assuming you meant 100k, that likely for something with 16mb of storage
(probably way small) where the data is more that 64k hence will not fit
into the row cache.

*.......*

*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yu...@kryptoncloud.com> wrote:

>
>
> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
> ssd EBS).
> I can reach a cluster wide write requests of 30k/second and read request
> about 100/second. The cluster OS load constantly above 10. Are those normal?
>
> Thanks!
>
>
> Best,
>
> Yuan
>
>