You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Durity, Sean R" <SE...@homedepot.com> on 2017/12/28 16:00:30 UTC
RE: [EXTERNAL] Lots of simultaneous connections?

Have you determined if a specific query is the one getting timed out? It is possible that the query/data model does not scale well, especially if you are trying to do something like a full table scan.

It is also possible that your OS settings will limit the number of connections to the host. Do you see any timewait connections in netstat? I would agree that 5,000 connections per host seems on the high side. Each one requires resources, like memory, so reducing connections is a good idea.


Sean Durity

-----Original Message-----
From: Max Campos [mailto:mc_cassandra@core43.com]
Sent: Thursday, December 14, 2017 3:18 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Lots of simultaneous connections?

Hi -

We’re finally putting our new application under load, and we’re starting to get this error message from the Python driver when under heavy load:

('Unable to connect to any servers', {‘x.y.z.205': OperationTimedOut('errors=None, last_host=None',), ‘x.y.z.204': OperationTimedOut('errors=None, last_host=None',), ‘x.y.z.206': OperationTimedOut('errors=None, last_host=None',)})' (22.7s)

Our cluster is running 3.0.6, has 3 nodes and we use RF=3, CL=QUORUM reads/writes.  We have a few thousand machines which are each making 1-10 connections to C* at once, but each of these connections only reads/writes a few records, waits several minutes, and then writes a few records — so while netstat reports ~5K connections per node, they’re generally idle.  Peak read/sec today was ~1500 per node, peak writes/sec was ~300 per node.  Read/write latencies peaked at 2.5ms.

Some questions:
1) Is anyone else out there making this many simultaneous connections?  Any idea what a reasonable number of connections is, what is too many, etc?

2) Any thoughts on which JMX metrics I should look at to better understand what exactly is exploding?  Is there a “number of active connections” metric?  We currently look at:
- client reads/writes per sec
- read/write latency
- compaction tasks
- repair tasks
- disk used by node
- disk used by table
- avg partition size per table

3) Any other advice?

I think I’ll try doing an explicit disconnect during the waiting period of our application’s execution; so as to get the C* connection count down.  Hopefully that will solve the timeout problem.

Thanks for your help.

- Max
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.