You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Eddy Truyen (Jira)" <ji...@apache.org> on 2020/04/21 14:26:00 UTC

[jira] [Commented] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

    [ https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088741#comment-17088741 ] 

Eddy Truyen commented on CASSANDRA-15717:
-----------------------------------------

Hi,

The performance overhead turned out to  be caused by a --parent-cgroup option that was set in Kubernetes orchestrated container and not set in the Docker container. For more information,see the following [Kubernetes issue|https://github.com/kubernetes/kubernetes/issues/90133].

> Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/benchmark
>            Reporter: Eddy Truyen
>            Priority: Normal
>         Attachments: nodetool-compaction-history-docker-cassandra.txt, nodetool-compaction-history-kubeadm-cassandra.txt
>
>
> Sorry for the slightly irrelevant post. This is not an issue with Cassandra but possibly with the interaction between Cassandra and Kubernetes.
> We experienced a performance degradation when running a single Cassandra instance inside kubeadm 1.14 in comparison with running the Docker container stand-alone.
>  A write-only workload (YCSB benchmark workload A - Load phase) using the following user table:
>  
> {{ cqlsh> create keyspace ycsb
>     WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
>     ;
>     cqlsh> USE ycsb;
>     cqlsh> create table usertable (
>     y_id varchar primary key,
>     field0 varchar,
>     field1 varchar,
>     field2 varchar,
>     field3 varchar,
>     field4 varchar,
>     field5 varchar,
>     field6 varchar,
>     field7 varchar,
>     field8 varchar,
>     field9 varchar);}}
> And using the following script:
>  
> {{python ./bin/ycsb load cassandra2-cql -P workloads/workloada -p recordcount=1500000 -p 
> operationcount=1500000 -p measurementtype=raw -p cassandra.connecttimeoutmillis=60000 -p 
> cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost > 
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt
> sleep 15}}
> We used the following image: {{decomads/cassandra:2.2.16}}, which uses the official {{cassandra:2.2.16}} as base image and adds a readinessProbe to it.
> We used identical Docker configuration parameters by ensuring that the output of {{docker inspect}} is as much as possible the same. First we got the YCSB benchmark in a container that is co-located with the cassandra container in one pod. Kubernetes starts these containers then with network mode {{net=container:...}} This is a separate container that links up the ycsb and cassandra containers within the same network space so they can talk via localhost. By this we hope to avoid network plugin interference from the CNI plugin.
> We ran the docker-only container within the Kubernetes node using the default bridge network
> We first performed the experiment on an Openstack VM Ubuntu 16:04 (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 16 CPU cores. Storage is Ceph however and therefore distributed
> To avoid distributed storage of ceph, we repeated the experiment also on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on a Windows 10 laptop with 4 cores/8 logical processors and 16GB RAM. However the same performance degradation was measured.
> Observations (On Ubuntu-OpenStack)
>  * Docker:
>  ** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms
>  * Kubernetes
>  ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
>  * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my position paper: [https://lirias.kuleuven.be/2788169?limo=0]):
> Possible causes:
>  * Network overhead of virtual bridge in Kubernetes is not the cause of the problem in our opinion.
>  ** We repeated the experiment where we ran the Docker-Only containers inside a Kubernetes node and we linked the containers using the --net=container: mode mechanisms as similar as possible as we could. The YCSB latency stayed the same.
>  * Disk/io bottleneck: Nodetool tablestats are very similar. Cassandra containers are configured to write data to a filesystem that is mounted from the host inside the container. Exactly the same Docker mount type is used
>  ** Write latency is very stable over multiple runs
>  * Kubernetes for ycsb user table: 0.0167 ms.
>  * Write latency Docker for ycsb usertable: 0.0150 ms.
>  ** Compaction_history/compaction_in_progress is also very similar (see attached files)
> )
> Do you know of any other causes that might explain the difference in reported YCSB reponse latency? Could it be the the Cassandra Session is closed by Kubernetes after each request?  How can I diagnose this?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org