You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mun Dega <mu...@gmail.com> on 2018/09/01 04:19:43 UTC

Re: Upgrade from 2.1 to 3.11

I think I narrowed down the constant Full GC due to accumulation of large
partitions as a result of upgrade to 3.11.
And how the large partitions were produced maybe related to
https://issues.apache.org/jira/browse/CASSANDRA-11887

I started to see duplicate records in one of the tables which probably
expanded the size of already semi large enough partitions to well over 300,
500MB and some cases 800MB.  My system.log started to report Writing large
partition during compaction quite frequently after the upgrade.

On CASSANDRA-11887, the suggestion is to run nodetool scrub which I will
try next to see if dup records are purged out.

Is anyone else familiar with this problem?  It says scrub fixes the
duplicate record issue but I'm not sure it would be that simple.  I think
the duplicate record issue is the cause of the problem but I don't see any
open issue on this other than 11887.

On Tue, Aug 28, 2018 at 12:14 PM ZAIDI, ASAD A <az...@att.com> wrote:

> You may want to check if coincidentally you’re having expired cells in
> heap. GC log should be able to tell you OR look for tombstones in
> system.log file. See your compactions are under control and normal.  This
> may not be related to upgrade at all!
>
>
>
>
>
> *From:* Pradeep Chhetri [mailto:pradeep@stashaway.com]
> *Sent:* Tuesday, August 28, 2018 3:32 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Upgrade from 2.1 to 3.11
>
>
>
> You may want to try upgrading to 3.11.3 instead which has some memory
> leaks fixes.
>
>
>
> On Tue, Aug 28, 2018 at 9:59 AM, Mun Dega <mu...@gmail.com> wrote:
>
> I am surprised that no one else ran into any issues with this version.  GC
> can't catch up fast enough and there is constant Full GC taking place.
>
>
>
> The result? unresponsive nodes makeing entire cluster unusable.
>
>
>
> Any insight on this issue from anyone that is using this version would be
> appreciated.
>
>
>
> Ma
>
>
>
> On Fri, Aug 24, 2018, 04:30 Mohamadreza Rostami <
> mohamadrezarostami2@gmail.com> wrote:
>
> You have very large heap,it’s take most of  cpu time in GC stage.you
> should in maximum set heap on 12GB and enable row cache to your cluster
> become faster.
>
> On Friday, 24 August 2018, Mun Dega <mu...@gmail.com> wrote:
>
> 120G data
>
> 28G heap out of 48 on system
>
> 9 node cluster, RF3
>
>
>
> On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami <
> mohamadrezarostami2@gmail.com> wrote:
>
> Hi,
>
> How much data do you have? How much RAM do your servers have? How much do
> you have a heep?
>
> On Thu, Aug 23, 2018 at 10:14 PM Mun Dega <mu...@gmail.com> wrote:
>
> Hello,
>
>
>
> We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The
> process went OK including upgradesstable but we started to experience high
> latency for r/w, occasional OOM and long GC pause after.
>
>
>
> For the same cluster with 2.1, we didn't have any issues like this.  We
> also kept server specs, heap, all the same in post upgrade
>
>
>
> Has anyone else had similar issues going to 3.11 and what are the major
> changes that could have such a major setback in the new version?
>
>
>
> Ma Dega
>
>
>

Re: Upgrade from 2.1 to 3.11

Posted by Gosar M <ko...@yahoo.com.INVALID>.
Hello,
Yes we encountered the same issue. See CASSANDRA-13125, CASSANDRA-12144, and 14008
The scrub helped us, but it took almost 4-5 hrs one 1 table of 145GB per node. We are still scrubbing our table to resolve this issue. The next step is to upgrade to next version as well.

Thank you

   On Friday, 31 August 2018, 21:20:10 GMT-7, Mun Dega <mu...@gmail.com> wrote:  
 
 I think I narrowed down the constant Full GC due to accumulation of large partitions as a result of upgrade to 3.11.And how the large partitions were produced maybe related to https://issues.apache.org/jira/browse/CASSANDRA-11887
I started to see duplicate records in one of the tables which probably expanded the size of already semi large enough partitions to well over 300, 500MB and some cases 800MB.  My system.log started to report Writing large partition during compaction quite frequently after the upgrade.
On CASSANDRA-11887, the suggestion is to run nodetool scrub which I will try next to see if dup records are purged out.
Is anyone else familiar with this problem?  It says scrub fixes the duplicate record issue but I'm not sure it would be that simple.  I think the duplicate record issue is the cause of the problem but I don't see any open issue on this other than 11887.
On Tue, Aug 28, 2018 at 12:14 PM ZAIDI, ASAD A <az...@att.com> wrote:


You may want to check if coincidentally you’re having expired cells in heap. GC log should be able to tell you OR look for tombstones in system.log file. See your compactions are under control and normal.  This may not be related to upgrade at all!

 

 

From: Pradeep Chhetri [mailto:pradeep@stashaway.com]
Sent: Tuesday, August 28, 2018 3:32 AM
To: user@cassandra.apache.org
Subject: Re: Upgrade from 2.1 to 3.11

 

You may want to try upgrading to 3.11.3 instead which has some memory leaks fixes.

 

On Tue, Aug 28, 2018 at 9:59 AM, Mun Dega <mu...@gmail.com> wrote:


I am surprised that no one else ran into any issues with this version.  GC can't catch up fast enough and there is constant Full GC taking place.

 

The result? unresponsive nodes makeing entire cluster unusable.

 

Any insight on this issue from anyone that is using this version would be appreciated.

 

Ma

 

On Fri, Aug 24, 2018, 04:30 Mohamadreza Rostami <mo...@gmail.com> wrote:


You have very large heap,it’s take most of  cpu time in GC stage.you should in maximum set heap on 12GB and enable row cache to your cluster become faster.

On Friday, 24 August 2018, Mun Dega <mu...@gmail.com> wrote:


120G data

28G heap out of 48 on system

9 node cluster, RF3

 

On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami <mo...@gmail.com> wrote:


Hi,

How much data do you have? How much RAM do your servers have? How much do you have a heep?

On Thu, Aug 23, 2018 at 10:14 PM Mun Dega <mu...@gmail.com> wrote:


Hello,

 

We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.  The process went OK including upgradesstable but we started to experience high latency for r/w, occasional OOM and long GC pause after.

 

For the same cluster with 2.1, we didn't have any issues like this.  We also kept server specs, heap, all the same in post upgrade

 

Has anyone else had similar issues going to 3.11 and what are the major changes that could have such a major setback in the new version?

 

Ma Dega