You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Kovgan (JIRA)" <ji...@apache.org> on 2015/12/24 07:46:49 UTC
[jira] [Updated] (CASSANDRA-10937) OOM on multiple nodes during test of insert load

     [ https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Kovgan updated CASSANDRA-10937:
-------------------------------------
    Description: 
8 cassandra nodes.

Load test started with 4 clients(different and not equal machines), each running 1000 threads.
Each thread assigned in round-robin way to run one of 4 different inserts. 
Consistency->ONE.

I attach the full CQL schema of tables and the query of insert.

Replication factor - 2:
create keyspace OBLREPOSITORY_NY with replication = {'class':'NetworkTopologyStrategy','NY':2};

Initiall throughput is:
215.000  inserts /sec
or
54Mb/sec, considering single insert size a bit larger than 256byte.

Data:
all fields(5-6) are short strings, except one is BLOB of 256 bytes.

After about a 2-3 hours of work, I was forced to increase timeout from 2000 to 5000ms, for some requests failed for short timeout.

Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
(all failed nodes logs attached)

I attach also example java load client.

Attachments:
test2.rar -contains most of material
more-logs.rar - contains additional nodes logs




  was:
8 cassandra nodes.

Load test started with 4 clients(different and not equal machines), each running 1000 threads.
Each thread assigned in round-robin way to run one of 4 different inserts. 
Consistency->ONE.

I attach the full CQL schema of tables and the query of insert.

Replication factor - 2:
create keyspace OBLREPOSITORY_NY with replication = {'class':'NetworkTopologyStrategy','NY':2};

Initiall throughput is:
215.000  inserts /sec
or
54Mb/sec, considering single insert size a bit larger than 256byte.

Data:
all fields(5-6) are short strings, except one is BLOB of 256 bytes.

After about a 2-3 hours of work, I was forced to increase timout from 2000 to 5000ms, for some requests failed for short timeout.

Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
(all failed nodes logs attached)

I attach also example java load client.

Attachments:
test2.rar -contains most of material
more-logs.rar - contains additional nodes logs





> OOM on multiple nodes during test of insert load
> ------------------------------------------------
>
>                 Key: CASSANDRA-10937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10937
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra : 3.0.0
> Installed as open archive, no connection to any OS specific installer.
> Java:
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> OS :
> Linux version 2.6.32-431.el6.x86_64 (mockbuild@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Sun Nov 10 22:19:54 EST 2013
> We have:
> 8 guests ( Linux OS as above) on 2 VMWare managed physical IBM M6 hosts. Each physical host keeps 4 guests.
> Each guest assigned to have:
> 1 disk 300 Gb for seq. log
> 1 disk 4T for data
> 11 CPU cores
> Disks are local, not shared.
> (lshw and cpuinfo attached in file)
>            Reporter: Peter Kovgan
>             Fix For: 3.0.3
>
>         Attachments: more-logs.rar, test2.rar
>
>
> 8 cassandra nodes.
> Load test started with 4 clients(different and not equal machines), each running 1000 threads.
> Each thread assigned in round-robin way to run one of 4 different inserts. 
> Consistency->ONE.
> I attach the full CQL schema of tables and the query of insert.
> Replication factor - 2:
> create keyspace OBLREPOSITORY_NY with replication = {'class':'NetworkTopologyStrategy','NY':2};
> Initiall throughput is:
> 215.000  inserts /sec
> or
> 54Mb/sec, considering single insert size a bit larger than 256byte.
> Data:
> all fields(5-6) are short strings, except one is BLOB of 256 bytes.
> After about a 2-3 hours of work, I was forced to increase timeout from 2000 to 5000ms, for some requests failed for short timeout.
> Later on(after aprox. 12 hous of work) OOM happens on multiple nodes.
> (all failed nodes logs attached)
> I attach also example java load client.
> Attachments:
> test2.rar -contains most of material
> more-logs.rar - contains additional nodes logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)