You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Andrew Cooper <An...@nisc.coop> on 2013/11/22 22:35:49 UTC

Large system.Migration CF after upgrade to 1.1

We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a single large (~4GB) row in system.Migrations on each cluster node.  We are also seeing heap pressure / Full GC issues when we do schema updates to this cluster.  If the two are related, is it possible to somehow remove/truncate the system.Migrations CF?  If I understand correctly, version 1.1 no longer uses this CF, instead using the system.schema_* CF's.   We have multiple clusters and clusters which were built from scratch at version 1.1 or 1.2 do no have data in system.Migrations.

I would appreciate any advice and I can provide more details if needed.

-Andrew

Andrew Cooper
National Information Solutions Cooperative®
3201 Nygren Drive NW
Mandan, ND 58554
* e-mail: andrew.cooper@nisc.coop<ma...@nisc.coop>
* phone: 866.999.6472 ext 6824
* direct: 701-667-6824

Re: Large system.Migration CF after upgrade to 1.1

Posted by Andrew Cooper <An...@nisc.coop>.

We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a single large (~4GB) row in system.Migrations on each cluster node.
There is some code in there to drop that CF at startup, but I’m not sure on the requirements for it to run. if the time stamps have not been updated in a while copy them out of the way and restart.

I was able to clear Schema and Migrations CF's by nodetool drain -> stop -> move SStables out -> start

We are also seeing heap pressure / Full GC issues when we do schema updates to this cluster
How much memory does the machine have and how is the JVM configured ?

Here are our current specs:
28 nodes.
48GB OS mem, 8 to 12 cores
3 drive raid 0 SATA 7200k RPM
~333GB load per node
6 keyspaces, ~3000 Column Families (a knowingly bad design, basically a set of CF's per tenant)
We had to ratchet up heap a few times and currently run at 20GB (from original of 8GB). GC can usually get down to roughly 8 or 9GB, but we seem to need a bit of extra headroom to stay stable.
Row Cache and Key Cache disabled (due to heap pressure issues awhile back)

Load profile is usually more reads than writes, but constant on both

Schema updates seem to be better, I am not sure if removing the old CFs did the trick or not. We still get sporadic situations where heap sky rockets on a few nodes which starts a domino effect around the ring of nodes going offline which equates to lots of client timeouts.
I suspect most of our issues lie in the code design of the applications on top of cassandra, but it is always difficult to troubleshoot root cause when cassandra gets into its funk.

We are slowly migrating keyspaces into their own clusters for further isolation and performance gains which will help long term. We are also in the process of CF consolidations to reduce the overall schema size.

On pre 1.1 that is often a result of memory pressure from the bloom filters and compression meta data being on the JVM heap. Do you have a lot (i.e. > 500Million ) rows per node ?

Check how small CMS can get the heap, it may be the case that it just cannot reduce it further.

As a work around you can: increase the heap, increase bloom_filter_fp_chance (per cf) and index_interval (yaml). My talk called “In case of emergency break glass” at the summit in SF this year talks about this http://thelastpickle.com/speaking/2013/06/11/Speaking-Cassandra-Summit-SF-2013.html

I was at your talk and really appreciated the information, we have used it in our scale-out to second datacenter.

Long term moving to 1.2 will help.

We do plan to upgrade to 1.2 within the next month, all of our other clusters are already running at 1.2, but this is our largest and most problematic cluster :)

-Andrew

Hope that helps.

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 23/11/2013, at 10:35 am, Andrew Cooper <An...@nisc.coop>> wrote:

We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a single large (~4GB) row in system.Migrations on each cluster node. We are also seeing heap pressure / Full GC issues when we do schema updates to this cluster. If the two are related, is it possible to somehow remove/truncate the system.Migrations CF? If I understand correctly, version 1.1 no longer uses this CF, instead using the system.schema_* CF's. We have multiple clusters and clusters which were built from scratch at version 1.1 or 1.2 do no have data in system.Migrations.

I would appreciate any advice and I can provide more details if needed.

-Andrew

Andrew Cooper
National Information Solutions Cooperative®
3201 Nygren Drive NW
Mandan, ND 58554
• e-mail: andrew.cooper@nisc.coop<ma...@nisc.coop>
• phone: 866.999.6472 ext 6824
• direct: 701-667-6824

Re: Large system.Migration CF after upgrade to 1.1

Posted by Aaron Morton <aa...@thelastpickle.com>.

> We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a single large (~4GB) row in system.Migrations on each cluster node. 
There is some code in there to drop that CF at startup, but I’m not sure on the requirements for it to run. if the time stamps have not been updated in a while copy them out of the way and restart. 

> We are also seeing heap pressure / Full GC issues when we do schema updates to this cluster
How much memory does the machine have and how is the JVM configured ? 

On pre 1.1 that is often a result of memory pressure from the bloom filters and compression meta data being on the JVM heap. Do you have a lot (i.e. > 500Million ) rows per node ? 

Check how small CMS can get the heap, it may be the case that it just cannot reduce it further. 

As a work around you can: increase the heap, increase bloom_filter_fp_chance (per cf) and index_interval (yaml). My talk called “In case of emergency break glass” at the summit in SF this year talks about this http://thelastpickle.com/speaking/2013/06/11/Speaking-Cassandra-Summit-SF-2013.html

Long term moving to 1.2 will help. 

Hope that helps. 

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/11/2013, at 10:35 am, Andrew Cooper <An...@nisc.coop> wrote:

> We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a single large (~4GB) row in system.Migrations on each cluster node.  We are also seeing heap pressure / Full GC issues when we do schema updates to this cluster.  If the two are related, is it possible to somehow remove/truncate the system.Migrations CF?  If I understand correctly, version 1.1 no longer uses this CF, instead using the system.schema_* CF's.   We have multiple clusters and clusters which were built from scratch at version 1.1 or 1.2 do no have data in system.Migrations.
> 
> I would appreciate any advice and I can provide more details if needed.  
> 
> -Andrew
> 
> Andrew Cooper
> National Information Solutions Cooperative®
> 3201 Nygren Drive NW
> Mandan, ND 58554
> + e-mail: andrew.cooper@nisc.coop
> ( phone: 866.999.6472 ext 6824
> ( direct: 701-667-6824  
>