You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Shashank Joshi <sh...@ericsson.com> on 2017/02/02 00:23:03 UTC

Performance issue in 3.0.9

We are seeing major performance issues with about 100 GB of data in 3.0.9-E001. The exact same app runs very well in 2.1.



It feels to us like something is wrong with our configuration because of the severity of the issues. Thanks in advance for any recommendations or suggestions.



Details:

Size of data: 100 GB+  all in one table, with a simple schema, couple of bigints and a double

Cluster: 3 nodes with RF of 3

Client: App uses read and write CL of QUORUM and we have a lots of timeouts due to inability to reach quorum

Compaction: Leveled

Nature of data usage: No updates/deletes, High reads, relatively low writes





JVM:

Using CMS GC and around 8 GB of max heap

Re: Performance issue in 3.0.9

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Also, if you have been tracking your performance metrics in graph form
(before and after the upgrade), that would be extremely helpful.
On Thu, Feb 2, 2017 at 5:29 AM Romain Hardouin <ro...@yahoo.fr.invalid>
wrote:

> Yes you should provide more context."Lots of timeouts": read? write?
> both?Did you run sstableupgrade? Java version ? (C* 3.0 requires Java 8u40
> or later)What is your data model? Lots of counters? Compression enabled on
> tables? "No updates/deletes": no deletes but is there TTL on data?
>  etc.
> Best,
> Romain
>     Le Jeudi 2 février 2017 9h52, Benjamin Lerer <
> benjamin.lerer@datastax.com> a écrit :
>
>
>  Guys,
>
> If you really want us to improve the things you need to be a bit more
> helpfull.
> We have no clue of what are the problems or changes in preformance that you
> see.
> So, if you could provide more context and facts it would be great.
> The more help and clarity you can provide, the easier it will be for us to
> investigate and solve those problems.
>
> By helping us you will help yourselves.
>
> On Thu, Feb 2, 2017 at 9:38 AM, Matija Gobec <ma...@gmail.com> wrote:
>
> > We ran for months with the same highly tuned setup on 2.1 and once we
> > switched to 3.0.9 the performance with the same configuration was crap.
> > Leveled compaction but a bit more nodes. There are differences in how 2.1
> > and 3.0 work so I guess you need to revisit your cassandra.yaml and os
> > settings.
> > Next to everything Jeff mentioned, is there any reason you have data on
> all
> > nodes and use QUORUM?
> > Also, is there any reason you are not using G1 with 3.0?
> >
> > On Thu, Feb 2, 2017 at 6:28 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> >
> > > Can you quantify "major"?
> > >
> > > Latency or throughput?
> > > GC pauses?
> > > What did you see before? What do you see now?
> > > Do you have a stack dump?
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Feb 1, 2017, at 4:23 PM, Shashank Joshi <
> > shashank.joshi@ericsson.com>
> > > wrote:
> > > >
> > > > We are seeing major performance issues with about 100 GB of data in
> > > 3.0.9-E001. The exact same app runs very well in 2.1.
> > > >
> > > >
> > > >
> > > > It feels to us like something is wrong with our configuration because
> > of
> > > the severity of the issues. Thanks in advance for any recommendations
> or
> > > suggestions.
> > > >
> > > >
> > > >
> > > > Details:
> > > >
> > > > Size of data: 100 GB+  all in one table, with a simple schema, couple
> > of
> > > bigints and a double
> > > >
> > > > Cluster: 3 nodes with RF of 3
> > > >
> > > > Client: App uses read and write CL of QUORUM and we have a lots of
> > > timeouts due to inability to reach quorum
> > > >
> > > > Compaction: Leveled
> > > >
> > > > Nature of data usage: No updates/deletes, High reads, relatively low
> > > writes
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > JVM:
> > > >
> > > > Using CMS GC and around 8 GB of max heap
> > > >
> > >
> >
>
>
>

Re: Performance issue in 3.0.9

Posted by Romain Hardouin <ro...@yahoo.fr.INVALID>.

Yes you should provide more context."Lots of timeouts": read? write? both?Did you run sstableupgrade? Java version ? (C* 3.0 requires Java 8u40 or later)What is your data model? Lots of counters? Compression enabled on tables? "No updates/deletes": no deletes but is there TTL on data?
 etc.
Best,
Romain
    Le Jeudi 2 février 2017 9h52, Benjamin Lerer <be...@datastax.com> a écrit :
 

 Guys,

If you really want us to improve the things you need to be a bit more
helpfull.
We have no clue of what are the problems or changes in preformance that you
see.
So, if you could provide more context and facts it would be great.
The more help and clarity you can provide, the easier it will be for us to
investigate and solve those problems.

By helping us you will help yourselves.

On Thu, Feb 2, 2017 at 9:38 AM, Matija Gobec <ma...@gmail.com> wrote:

> We ran for months with the same highly tuned setup on 2.1 and once we
> switched to 3.0.9 the performance with the same configuration was crap.
> Leveled compaction but a bit more nodes. There are differences in how 2.1
> and 3.0 work so I guess you need to revisit your cassandra.yaml and os
> settings.
> Next to everything Jeff mentioned, is there any reason you have data on all
> nodes and use QUORUM?
> Also, is there any reason you are not using G1 with 3.0?
>
> On Thu, Feb 2, 2017 at 6:28 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> > Can you quantify "major"?
> >
> > Latency or throughput?
> > GC pauses?
> > What did you see before? What do you see now?
> > Do you have a stack dump?
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Feb 1, 2017, at 4:23 PM, Shashank Joshi <
> shashank.joshi@ericsson.com>
> > wrote:
> > >
> > > We are seeing major performance issues with about 100 GB of data in
> > 3.0.9-E001. The exact same app runs very well in 2.1.
> > >
> > >
> > >
> > > It feels to us like something is wrong with our configuration because
> of
> > the severity of the issues. Thanks in advance for any recommendations or
> > suggestions.
> > >
> > >
> > >
> > > Details:
> > >
> > > Size of data: 100 GB+  all in one table, with a simple schema, couple
> of
> > bigints and a double
> > >
> > > Cluster: 3 nodes with RF of 3
> > >
> > > Client: App uses read and write CL of QUORUM and we have a lots of
> > timeouts due to inability to reach quorum
> > >
> > > Compaction: Leveled
> > >
> > > Nature of data usage: No updates/deletes, High reads, relatively low
> > writes
> > >
> > >
> > >
> > >
> > >
> > > JVM:
> > >
> > > Using CMS GC and around 8 GB of max heap
> > >
> >
>

Re: Performance issue in 3.0.9

Posted by Benjamin Lerer <be...@datastax.com>.

Guys,

If you really want us to improve the things you need to be a bit more
helpfull.
We have no clue of what are the problems or changes in preformance that you
see.
So, if you could provide more context and facts it would be great.
The more help and clarity you can provide, the easier it will be for us to
investigate and solve those problems.

By helping us you will help yourselves.

On Thu, Feb 2, 2017 at 9:38 AM, Matija Gobec <ma...@gmail.com> wrote:

> We ran for months with the same highly tuned setup on 2.1 and once we
> switched to 3.0.9 the performance with the same configuration was crap.
> Leveled compaction but a bit more nodes. There are differences in how 2.1
> and 3.0 work so I guess you need to revisit your cassandra.yaml and os
> settings.
> Next to everything Jeff mentioned, is there any reason you have data on all
> nodes and use QUORUM?
> Also, is there any reason you are not using G1 with 3.0?
>
> On Thu, Feb 2, 2017 at 6:28 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> > Can you quantify "major"?
> >
> > Latency or throughput?
> > GC pauses?
> > What did you see before? What do you see now?
> > Do you have a stack dump?
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Feb 1, 2017, at 4:23 PM, Shashank Joshi <
> shashank.joshi@ericsson.com>
> > wrote:
> > >
> > > We are seeing major performance issues with about 100 GB of data in
> > 3.0.9-E001. The exact same app runs very well in 2.1.
> > >
> > >
> > >
> > > It feels to us like something is wrong with our configuration because
> of
> > the severity of the issues. Thanks in advance for any recommendations or
> > suggestions.
> > >
> > >
> > >
> > > Details:
> > >
> > > Size of data: 100 GB+  all in one table, with a simple schema, couple
> of
> > bigints and a double
> > >
> > > Cluster: 3 nodes with RF of 3
> > >
> > > Client: App uses read and write CL of QUORUM and we have a lots of
> > timeouts due to inability to reach quorum
> > >
> > > Compaction: Leveled
> > >
> > > Nature of data usage: No updates/deletes, High reads, relatively low
> > writes
> > >
> > >
> > >
> > >
> > >
> > > JVM:
> > >
> > > Using CMS GC and around 8 GB of max heap
> > >
> >
>

RE: Performance issue in 3.0.9

Posted by Shashank Joshi <sh...@ericsson.com>.

Hi Matija,
Your experience mirrors ours. Can you please share any lessons learned or suggestions you might have ?

We are using CMS because that is the default setting that came with 3.0.9.  We had read that G1 was supposed to be the default seeing in 3.0 but the following links made it seem as if CMS was working better for 3.0 - 
https://issues.apache.org/jira/browse/CASSANDRA-10326 
http://cstar.datastax.com/graph?stats=518e5484-5ee3-11e5-b421-42010af0688f&metric=99.9th_latency&operation=1_user&smoothing=1&show_aggregates=true&xmin=0&xmax=865.37&ymin=0&ymax=158.51

If this is incorrect we can certainly try with G1. Would that be a recommendation ?

Thank you to all the others who have asked for more details. Here is some more information.

The performance hit is something like 80 times worse than 2.1, if it even completes the standard read-write operations that we are running. It is actually worse than that because a lot of the reads and writes are failing with timeouts due to lack of quorum. We also tried with a CL of ONE for both reads and writes just to see if that worked, but that also failed.

Since we had problems when we upgraded 3.0 with 2.1 data, we reproduced the performance problem by starting clean in 3.0 and creating all the data fresh in 3.0.  In this case, we loaded the data into one node, and let replication take care of updating the other two. We are using RF of 3 with 3 nodes because our app uses QUORUM for better consistency and we want to be able to have an HA setup where we tolerate the failure of one node at a time.

Regarding compaction:
We do not update or delete data, nor do we have TTLs on data at this time. So it seems as if compaction if any should not be a major concern but we do see it happening. So we even tested with autocompaction turned off but did not see any improvement.

Thank you for any insights.

-----Original Message-----
From: Matija Gobec [mailto:matija0204@gmail.com] 
Sent: Thursday, February 02, 2017 12:39 AM
To: dev@cassandra.apache.org
Subject: Re: Performance issue in 3.0.9

We ran for months with the same highly tuned setup on 2.1 and once we switched to 3.0.9 the performance with the same configuration was crap.
Leveled compaction but a bit more nodes. There are differences in how 2.1 and 3.0 work so I guess you need to revisit your cassandra.yaml and os settings.
Next to everything Jeff mentioned, is there any reason you have data on all nodes and use QUORUM?
Also, is there any reason you are not using G1 with 3.0?

On Thu, Feb 2, 2017 at 6:28 AM, Jeff Jirsa <jj...@gmail.com> wrote:

> Can you quantify "major"?
>
> Latency or throughput?
> GC pauses?
> What did you see before? What do you see now?
> Do you have a stack dump?
>
>
> --
> Jeff Jirsa
>
>
> > On Feb 1, 2017, at 4:23 PM, Shashank Joshi 
> > <sh...@ericsson.com>
> wrote:
> >
> > We are seeing major performance issues with about 100 GB of data in
> 3.0.9-E001. The exact same app runs very well in 2.1.
> >
> >
> >
> > It feels to us like something is wrong with our configuration 
> > because of
> the severity of the issues. Thanks in advance for any recommendations 
> or suggestions.
> >
> >
> >
> > Details:
> >
> > Size of data: 100 GB+  all in one table, with a simple schema, 
> > couple of
> bigints and a double
> >
> > Cluster: 3 nodes with RF of 3
> >
> > Client: App uses read and write CL of QUORUM and we have a lots of
> timeouts due to inability to reach quorum
> >
> > Compaction: Leveled
> >
> > Nature of data usage: No updates/deletes, High reads, relatively low
> writes
> >
> >
> >
> >
> >
> > JVM:
> >
> > Using CMS GC and around 8 GB of max heap
> >
>

Re: Performance issue in 3.0.9

Posted by Matija Gobec <ma...@gmail.com>.

We ran for months with the same highly tuned setup on 2.1 and once we
switched to 3.0.9 the performance with the same configuration was crap.
Leveled compaction but a bit more nodes. There are differences in how 2.1
and 3.0 work so I guess you need to revisit your cassandra.yaml and os
settings.
Next to everything Jeff mentioned, is there any reason you have data on all
nodes and use QUORUM?
Also, is there any reason you are not using G1 with 3.0?

On Thu, Feb 2, 2017 at 6:28 AM, Jeff Jirsa <jj...@gmail.com> wrote:

> Can you quantify "major"?
>
> Latency or throughput?
> GC pauses?
> What did you see before? What do you see now?
> Do you have a stack dump?
>
>
> --
> Jeff Jirsa
>
>
> > On Feb 1, 2017, at 4:23 PM, Shashank Joshi <sh...@ericsson.com>
> wrote:
> >
> > We are seeing major performance issues with about 100 GB of data in
> 3.0.9-E001. The exact same app runs very well in 2.1.
> >
> >
> >
> > It feels to us like something is wrong with our configuration because of
> the severity of the issues. Thanks in advance for any recommendations or
> suggestions.
> >
> >
> >
> > Details:
> >
> > Size of data: 100 GB+  all in one table, with a simple schema, couple of
> bigints and a double
> >
> > Cluster: 3 nodes with RF of 3
> >
> > Client: App uses read and write CL of QUORUM and we have a lots of
> timeouts due to inability to reach quorum
> >
> > Compaction: Leveled
> >
> > Nature of data usage: No updates/deletes, High reads, relatively low
> writes
> >
> >
> >
> >
> >
> > JVM:
> >
> > Using CMS GC and around 8 GB of max heap
> >
>

Re: Performance issue in 3.0.9

Posted by Jeff Jirsa <jj...@gmail.com>.

Can you quantify "major"?

Latency or throughput?
GC pauses? 
What did you see before? What do you see now?
Do you have a stack dump? 


-- 
Jeff Jirsa


> On Feb 1, 2017, at 4:23 PM, Shashank Joshi <sh...@ericsson.com> wrote:
> 
> We are seeing major performance issues with about 100 GB of data in 3.0.9-E001. The exact same app runs very well in 2.1.
> 
> 
> 
> It feels to us like something is wrong with our configuration because of the severity of the issues. Thanks in advance for any recommendations or suggestions.
> 
> 
> 
> Details:
> 
> Size of data: 100 GB+  all in one table, with a simple schema, couple of bigints and a double
> 
> Cluster: 3 nodes with RF of 3
> 
> Client: App uses read and write CL of QUORUM and we have a lots of timeouts due to inability to reach quorum
> 
> Compaction: Leveled
> 
> Nature of data usage: No updates/deletes, High reads, relatively low writes
> 
> 
> 
> 
> 
> JVM:
> 
> Using CMS GC and around 8 GB of max heap
>