You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "David Miller (JIRA)" <ji...@apache.org> on 2016/01/07 21:03:40 UTC
[jira] [Created] (AMBARI-14580) ams collector clients likely to
create self simultaneous open tcp sockets
David Miller created AMBARI-14580:
-------------------------------------
Summary: ams collector clients likely to create self simultaneous open tcp sockets
Key: AMBARI-14580
URL: https://issues.apache.org/jira/browse/AMBARI-14580
Project: Ambari
Issue Type: Bug
Components: ambari-metrics
Affects Versions: 2.1.0
Environment: IBM BigInsights 4.1
Reporter: David Miller
Multiple clients connect to the ambari metrics timeline metrics service.
timeline.metrics.service.webapp.address in the Advanced ams-site configuration section specifies the collector port by default as 6188.
Many of these clients are on the same host as the collector which can lead to them creating a self simultaneous open TCP connection if the ambari metrics collector is not listening on this port (such as when it is stopped). See http://stackoverflow.com/questions/5139808/tcp-simultaneous-open-and-self-connect-prevention for a discussion of this condition.
Once this condition is triggered, the ams collector cannot start because the port is now held by the client which tried to connect to it.
Any client which connects to itself expecting to connect to the ams collector appears to hold this connection forever.
We have seen this condition happen twice by accident and we can reproduce. While this condition is possible for any connection with the same remote and local address it appears that it is especially likely to happen with connections to the ams collector, probably due to the usual scenario of having the collector on the same machine as many other services which try to connect to it.
To reproduce the problem:
1.Stop the ambari metrics collector
2.wait an unspecified amount of time (hours or days) and check netstat for self simultaneous open connections having the same local and remote host:port tuple like the below:
a. tcp 0 0 10.93.132.110:6188 10.93.132.110:6188 ESTABLISHED –
3. attempt to start the ambari metrics collector, it will fail with an error line:
Caused by: java.net.BindException: Port in use: 0.0.0.0:6188
Possible Solutions:
*Change collector clients to time out connections when no response or unexpected responses are received (connected to self scenario)
*Enable SO_REUSEADDR to possibly decrease chances of selecting the same local port as remote port
*Recommend that users reconfigure their OS's ephermal port range to not include the collector listener port
*Increase reconnect wait time when connecting to the connector
*Others?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)