You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Michael Penick (JIRA)" <ji...@apache.org> on 2013/12/31 19:30:52 UTC

[jira] [Created] (CASSANDRA-6534) Slow inserts with collections into a single partition (Pathological GC behavior)

Michael Penick created CASSANDRA-6534:
-----------------------------------------

             Summary: Slow inserts with collections into a single partition (Pathological GC behavior)
                 Key: CASSANDRA-6534
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6534
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: dsc12-1.2.12-1.noarch.rpm
cassandra12-1.2.12-1.noarch.rpm

GC flags:
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms8192M
-Xmx8192M
-Xmn2048M
-XX:+HeapDumpOnOutOfMemoryError
-Xss180k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseTLAB
            Reporter: Michael Penick
             Fix For: 1.2.12


We noticed extremely slow insertion rates to a single partition key, using composite column with a collection value. We were not able to replicate the issue using the same schema, but with a non-colleciton value even with much larger values. 

There are tons of these in the logs:

"GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 collections, 1233256368 used; max is 8375238656"

We are inserting a tiny amounta of data 32-64 bytes and seeing the issue after only a couple 10k inserts. The amount of memory being used by C*/JVM is no where near proportional to the amount data being inserted. Why is C* consuming so much memory?

Attached are pictures of the GC under the different tests. Keep in mind we are only inserting 128KB - 256KB of data and we are almost hitting the limit of the heap.

Example schemas:

{code}
CREATE TABLE test.test (
row_key text, 
column_key uuid,
 column_value list<int>, 
PRIMARY KEY(row_key, column_key));

CREATE TABLE test.test (
row_key text, 
column_key uuid, 
column_value map<text, text>, 
PRIMARY KEY(row_key, column_key));
{code}

Example inserts:

Note: This issue is able to be replicated with extremely small inserts (a well as larger ~1KB)

{code}
INSERT INTO test.test 
(row_key, column_key, column_value)
VALUES 
('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]);

INSERT INTO test.test 
(row_key, column_key, column_value) 
VALUES
('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': '0123456701234567012345670',  'b': '0123456701234567012345670' });
{code}

As a comparison I was able to run the same tests with the following schema with no issue:

Note: This test was able to run a much faster insertion speed and much bigger column sizes (1KB) without any GC issues.

{code}
CREATE TABLE test.test (
row_key text, 
column_key uuid, 
column_value text, 
PRIMARY KEY(row_key, column_key) )
{code}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)