You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kant Kodali <ka...@peernova.com> on 2017/03/01 03:51:45 UTC

Re: Attached profiled data but need help understanding it

Hi Romain,

Thanks again. My response are inline.

kant

On Tue, Feb 28, 2017 at 10:04 AM, Romain Hardouin <ro...@yahoo.fr>
wrote:

> > we are currently using 3.0.9.  should we use 3.8 or 3.10
>
> No, don't use 3.X in production unless you really need a major feature.
> I would advise to stick to 3.0.X (i.e. 3.0.11 now).
> You can backport CASSANDRA-11966 easily but of course you have to deploy
> from source as a prerequisite.
>

  * By backporting you mean I should cherry pick CASSANDRA-11966 commit and
compile from source?*

>
> > I haven't done any tuning yet.
>
> So it's a good news because maybe there is room for improvement
>
> > Can I change this on a running instance? If so, how? or does it require
> a downtime?
>
> You can throttle compaction at runtime with "nodetool
> setcompactionthroughput". Be sure to read all nodetool commmands, some of
> them are really useful for a day to day tuning/management.
>
> If GC is fine, then check other things -> "[...] different pool sizes for
> NTR, concurrent reads and writes, compaction executors, etc. Also check if
> you can improve network latency (e.g. VF or ENA on AWS)."
>
> Regarding thread pools, some of them can be resized at runtime via JMX.
>
> > 5000 is the target.
>
> Right now you reached 1500. Is it per node or for the cluster?
> We don't know your setup so it's hard to say it's doable. Can you provide
> more details? VM, physical nodes, #nodes, etc.
> Generally speaking LWT should be seldom used. AFAIK you won't achieve
> 10,000 writes/s per node.
>
> Maybe someone on the list already made some tuning for heavy LWT workload?
>

*    1500 total cluster.  *

*    I have a 8 node cassandra cluster. Each node is AWS m4.xlarge instance
(so 4 vCPU, 16GB, 1Gbit network=125MB/s)*



*    I have 1 node (m4.xlarge) for my application which just inserts a
bunch of data and each insert is an LWT     I tested the network throughput
of the node.  I can get up 98 MB/s.*

*    Now, when I start my application. I see that Cassandra nodes Receive
rate/ throughput is about 4MB/s (yes it is in Mega Bytes. I checked this by
running sudo iftop -B). The Disk I/O is also same and the Cassandra process
CPU usage is about 360% (the max is 400% since it is a 4 core machine). The
application node transmission throughput is about 6MB/s. so even with 4MB/s
receive throughput at Cassandra node the CPU is almost maxed out. I am not
sure what this says about Cassandra? But, what I can tell is that Network
is way underutilized and that 8 nodes are unnecessary so we plan to bring
it down to 4 nodes except each node this time will have 8 cores. All said,
I am still not sure how to scale up from 1500 writes/sec? *


>
> Best,
>
> Romain
>
>

Re: Attached profiled data but need help understanding it

Posted by Romain Hardouin <ro...@yahoo.fr>.
Hi Kant,
You'll find more information about ixgbevf here http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sriov-networking.htmlI repeat myself but don't underestimate VMs placement: same AZ? same placement group? etc.Note that LWT are not discouraged but as the doc says: "[...] reserve lightweight transactions for those situations where they are absolutely necessary;"I hope you'll be able to achieve what you want with more powerful VMs. Let us know!
Best,Romain
 

    Le Lundi 6 mars 2017 10h49, Kant Kodali <ka...@peernova.com> a écrit :
 

 Hi Romain,
We may be able to achieve what we need without LWT but that would require bunch of changes from the application side and possibly introducing caching layers and designing solution around that. But for now, we are constrained to use LWT's for another month or so. All said, I still would like to see the discouraged features such as LWT's, secondary indexes, triggers get better over time so it would really benefit users.
Agreed High park/unpark is a sign of excessive context switching but any ideas why this is happening? yes today we will be experimenting with c3.2Xlarge and see what the numbers look like and slowly scale up from there.
How do I make sure I install  ixgbevf driver? Do M4.xlarge or C3.2Xlarge don't already have it? when I googled " ixgbevf driver" it tells me it is ethernet driver...I thought all instances by default run on ethernet on AWS. can you please give more context on this?
Thanks,kant
On Fri, Mar 3, 2017 at 4:42 AM, Romain Hardouin <ro...@yahoo.fr> wrote:

Also, I should have mentioned that it would be a good idea to spawn your three benchmark instances in the same AZ, then try with one instance on each AZ to see how network latency affects your LWT rate. The lower latency is achievable with three instances on the same placement group of course but it's kinda dangerous for production. 





   

Re: Attached profiled data but need help understanding it

Posted by Kant Kodali <ka...@peernova.com>.
Hi Romain,

We may be able to achieve what we need without LWT but that would require
bunch of changes from the application side and possibly introducing caching
layers and designing solution around that. But for now, we are constrained
to use LWT's for another month or so. All said, I still would like to see
the discouraged features such as LWT's, secondary indexes, triggers get
better over time so it would really benefit users.

Agreed High park/unpark is a sign of excessive context switching but any
ideas why this is happening? yes today we will be experimenting with
c3.2Xlarge and see what the numbers look like and slowly scale up from
there.

How do I make sure I install  ixgbevf driver? Do M4.xlarge or C3.2Xlarge
don't already have it? when I googled " ixgbevf driver" it tells me it is
ethernet driver...I thought all instances by default run on ethernet on
AWS. can you please give more context on this?

Thanks,
kant

On Fri, Mar 3, 2017 at 4:42 AM, Romain Hardouin <ro...@yahoo.fr> wrote:

> Also, I should have mentioned that it would be a good idea to spawn your
> three benchmark instances in the same AZ, then try with one instance on
> each AZ to see how network latency affects your LWT rate. The lower latency
> is achievable with three instances on the same placement group of course
> but it's kinda dangerous for production.
>
>
>

Re: Attached profiled data but need help understanding it

Posted by Romain Hardouin <ro...@yahoo.fr>.
Also, I should have mentioned that it would be a good idea to spawn your three benchmark instances in the same AZ, then try with one instance on each AZ to see how network latency affects your LWT rate. The lower latency is achievable with three instances on the same placement group of course but it's kinda dangerous for production. 


Re: Attached profiled data but need help understanding it

Posted by Romain Hardouin <ro...@yahoo.fr>.
Hi Kant,
> By backporting you mean I should cherry pick CASSANDRA-11966 commit and compile from source?
Yes
Regarding the network utilization: you checked throughput but latency is more important for LWT. That's why you should make sure your m4 instances (both C* and client) are using ixgbevf driver.
I agree 1500 writes/s is not impressive but 4 vCPU is low. It depends on the workload but my experience is that an AWS instance become to be powerful with 16 vCPUs (e.g. c3.4xlarge). And beware of EBS (again, that's my experience YMMV).
High park/unpark is a sign of excessive context switching. If I were you I would make a LWT benchmark with 3 x c3.4xlarge or c3.8xlarge (32 vCPUs, SSD instance store). Spawn spot instances to save money and be sure to tune cassandra.yaml accordingly e.g. concurrent_writes.
Finally, a naive question but I must ask you... are you really sure you need LWT? Can't you achieve your goal without it?

 Best,
Romain

    Le Jeudi 2 mars 2017 10h31, Kant Kodali <ka...@peernova.com> a écrit :
 

 Hi Romain,
Any ideas on this? I am not sure why there is so much time being spent in Park and Unpark methods as produced by thread dump? Also, could you please look into my responses from other email? It would greatly help.
Thanks,kant
On Tue, Feb 28, 2017 at 10:20 PM, Kant Kodali <ka...@peernova.com> wrote:

Hi Romain,
I am using Cassandra version 3.0.9 and here is the generated report  (Graphical view) of my thread dump as well!. Just send this over in case if it helps.
Thanks,kant
On Tue, Feb 28, 2017 at 7:51 PM, Kant Kodali <ka...@peernova.com> wrote:

Hi Romain,
Thanks again. My response are inline.
kant

On Tue, Feb 28, 2017 at 10:04 AM, Romain Hardouin <ro...@yahoo.fr> wrote:

> we are currently using 3.0.9.  should we use 3.8 or 3.10
No, don't use 3.X in production unless you really need a major feature.I would advise to stick to 3.0.X (i.e. 3.0.11 now).You can backport CASSANDRA-11966 easily but of course you have to deploy from source as a prerequisite.

   By backporting you mean I should cherry pick CASSANDRA-11966 commit and compile from source?

> I haven't done any tuning yet.
So it's a good news because maybe there is room for improvement
> Can I change this on a running instance? If so, how? or does it require a downtime?
You can throttle compaction at runtime with "nodetool setcompactionthroughput". Be sure to read all nodetool commmands, some of them are really useful for a day to day tuning/management. 
If GC is fine, then check other things -> "[...] different pool sizes for NTR, concurrent reads and writes, compaction executors, etc. Also check if you can improve network latency (e.g. VF or ENA on AWS)."
Regarding thread pools, some of them can be resized at runtime via JMX.
> 5000 is the target.
Right now you reached 1500. Is it per node or for the cluster?We don't know your setup so it's hard to say it's doable. Can you provide more details? VM, physical nodes, #nodes, etc.Generally speaking LWT should be seldom used. AFAIK you won't achieve 10,000 writes/s per node.
Maybe someone on the list already made some tuning for heavy LWT workload?

    1500 total cluster.  
    I have a 8 node cassandra cluster. Each node is AWS m4.xlarge instance (so 4 vCPU, 16GB, 1Gbit network=125MB/s)
    I have 1 node (m4.xlarge) for my application which just inserts a bunch of data and each insert is an LWT
 
    I tested the network throughput of the node.  I can get up 98 MB/s.
    Now, when I start my application. I see that Cassandra nodes Receive rate/ throughput is about 4MB/s (yes it is in Mega Bytes. I checked this by running sudo iftop -B). The Disk I/O is also same and the Cassandra process CPU usage is about 360% (the max is 400% since it is a 4 core machine). The application node transmission throughput is about 6MB/s. so even with 4MB/s receive throughput at Cassandra node the CPU is almost maxed out. I am not sure what this says about Cassandra? But, what I can tell is that Network is way underutilized and that 8 nodes are unnecessary so we plan to bring it down to 4 nodes except each node this time will have 8 cores. All said, I am still not sure how to scale up from 1500 writes/sec?       

Best,
Romain








   

Re: Attached profiled data but need help understanding it

Posted by Kant Kodali <ka...@peernova.com>.
Hi Romain,

I am using Cassandra version 3.0.9 and here is the generated report
<http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTcvMDMvMS8tLWpzdGFja19kdW1wLm91dC0tMi0yNC00OA==>
(Graphical view) of my thread dump as well!. Just send this over in case if
it helps.

Thanks,
kant

On Tue, Feb 28, 2017 at 7:51 PM, Kant Kodali <ka...@peernova.com> wrote:

> Hi Romain,
>
> Thanks again. My response are inline.
>
> kant
>
> On Tue, Feb 28, 2017 at 10:04 AM, Romain Hardouin <ro...@yahoo.fr>
> wrote:
>
>> > we are currently using 3.0.9.  should we use 3.8 or 3.10
>>
>> No, don't use 3.X in production unless you really need a major feature.
>> I would advise to stick to 3.0.X (i.e. 3.0.11 now).
>> You can backport CASSANDRA-11966 easily but of course you have to deploy
>> from source as a prerequisite.
>>
>
>   * By backporting you mean I should cherry pick CASSANDRA-11966 commit
> and compile from source?*
>
>>
>> > I haven't done any tuning yet.
>>
>> So it's a good news because maybe there is room for improvement
>>
>> > Can I change this on a running instance? If so, how? or does it require
>> a downtime?
>>
>> You can throttle compaction at runtime with "nodetool
>> setcompactionthroughput". Be sure to read all nodetool commmands, some of
>> them are really useful for a day to day tuning/management.
>>
>> If GC is fine, then check other things -> "[...] different pool sizes for
>> NTR, concurrent reads and writes, compaction executors, etc. Also check if
>> you can improve network latency (e.g. VF or ENA on AWS)."
>>
>> Regarding thread pools, some of them can be resized at runtime via JMX.
>>
>> > 5000 is the target.
>>
>> Right now you reached 1500. Is it per node or for the cluster?
>> We don't know your setup so it's hard to say it's doable. Can you provide
>> more details? VM, physical nodes, #nodes, etc.
>> Generally speaking LWT should be seldom used. AFAIK you won't achieve
>> 10,000 writes/s per node.
>>
>> Maybe someone on the list already made some tuning for heavy LWT workload?
>>
>
> *    1500 total cluster.  *
>
> *    I have a 8 node cassandra cluster. Each node is AWS m4.xlarge
> instance (so 4 vCPU, 16GB, 1Gbit network=125MB/s)*
>
>
>
> *    I have 1 node (m4.xlarge) for my application which just inserts a
> bunch of data and each insert is an LWT     I tested the network throughput
> of the node.  I can get up 98 MB/s.*
>
> *    Now, when I start my application. I see that Cassandra nodes Receive
> rate/ throughput is about 4MB/s (yes it is in Mega Bytes. I checked this by
> running sudo iftop -B). The Disk I/O is also same and the Cassandra process
> CPU usage is about 360% (the max is 400% since it is a 4 core machine). The
> application node transmission throughput is about 6MB/s. so even with 4MB/s
> receive throughput at Cassandra node the CPU is almost maxed out. I am not
> sure what this says about Cassandra? But, what I can tell is that Network
> is way underutilized and that 8 nodes are unnecessary so we plan to bring
> it down to 4 nodes except each node this time will have 8 cores. All said,
> I am still not sure how to scale up from 1500 writes/sec? *
>
>
>>
>> Best,
>>
>> Romain
>>
>>
>