You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Yunkai Zhang (JIRA)" <ji...@apache.org> on 2013/08/20 10:02:53 UTC

[jira] [Updated] (TS-2141) Inconsistent euid cause bad mgmtapisocket/eventapisocket

     [ https://issues.apache.org/jira/browse/TS-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yunkai Zhang updated TS-2141:
-----------------------------

    Description: 
In the main function of traffic_manager, *webIntr_main* thread is created after euid has been changed to "proxy.config.admin.user_id".

And then, webIntr_main thread will create mgmtapisocket/eventapisocket socket fd, and listen on them.

But unfortunately, after created webIntr_main thread, the main thread will call listenForProxy()->bindProxyPort() immediately, which might change/restore *euid* concurrently.

For example:
1) bindProxyPort() change the euid to 0(root uid).
2) webIntr_main creates mgmtapisocket with 0 euid.
3) bindProxyPort() restore the euid to 501(admin.user_id)
4) webIntr_main calls chmod(0777) on mgmtapisocket, it'll get "Operation not permitted" error and other unexpected errors.

As the api sockets can't create correctly, traffic_cop will failed to establish heartbeat with traffic_manager, and than traffic_cop will kill traffic_manager automatically.

But more worse, after killed and restarted manager, traffic_cop forgets to reconnect to manager, so it will use bad main_socket_fd to detect heartbeat of manager. Of course, it'll failed and than kill/restart manager again and again and again:

{code}
[E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
[Aug 15 18:09:20.730] {0x7ffb873aa7e0} STATUS: opened /var/log/trafficserver/manager.log
[Aug 15 18:09:20.730] {0x7ffb873aa7e0} NOTE: updated diags config
[Aug 15 18:09:20.732] Manager {0x7ffb873aa7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
[Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [newUNIXsocket] Unable to chmod unix-domain socket: Operation not permitted
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
[Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [WebIntrMain] Unable to set up so for handling managment API event calls. Event Socket path: /var/run/tr>
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [TrafficManager] Setup complete
[Aug 15 18:09:21.751] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::startProxy] Launching ts process
[TrafficServer] using root directory '/usr'
[Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '13'
[Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [Alarms::signalAlarm] Server Process born
[Aug 15 18:09:22.775] {0x2b4218293740} STATUS: opened /var/log/trafficserver/diags.log
[Aug 15 18:09:23.171] {0x2b4218293740} NOTE: updated diags config
[Aug 15 18:09:23.175] Server {0x2b4218293740} NOTE: cache clustering disabled
[Aug 15 18:09:23.176] Server {0x2b4218293740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
[Aug 15 18:09:23.202] Server {0x2b4218293740} NOTE: cache clustering disabled
[Aug 15 18:09:23.202] Server {0x2b4218293740} WARNING: unable to open cache disk(s): SSD Cache Disabled
[Aug 15 18:09:23.214] Server {0x2b4218293740} NOTE: logging initialized[15], logging_mode = 3
[Aug 15 18:09:23.215] Server {0x2b4218293740} WARNING: Access logging to local log directory suspended - partition space is low.
[Aug 15 18:09:23.248] Server {0x2b4218293740} NOTE: traffic server running
[Aug 15 18:09:39.200] Server {0x2b421a620700} NOTE: cache enabled
FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
[E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
[Aug 15 18:09:52.744] {0x7f918588f7e0} STATUS: opened /var/log/trafficserver/manager.log
[Aug 15 18:09:52.744] {0x7f918588f7e0} NOTE: updated diags config
[Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
[Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
[Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
[Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
[Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [TrafficManager] Setup complete
[Aug 15 18:09:53.765] Manager {0x7f918588f7e0} NOTE: [LocalManager::startProxy] Launching ts process
[TrafficServer] using root directory '/usr'
[Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '12'
[Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [Alarms::signalAlarm] Server Process born
[Aug 15 18:09:54.788] {0x2acfdbb59740} STATUS: opened /var/log/trafficserver/diags.log
[Aug 15 18:09:54.789] {0x2acfdbb59740} NOTE: updated diags config
[Aug 15 18:09:54.792] Server {0x2acfdbb59740} NOTE: cache clustering disabled
[Aug 15 18:09:54.794] Server {0x2acfdbb59740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
[Aug 15 18:09:54.820] Server {0x2acfdbb59740} NOTE: cache clustering disabled
[Aug 15 18:09:54.820] Server {0x2acfdbb59740} WARNING: unable to open cache disk(s): SSD Cache Disabled
[Aug 15 18:09:54.833] Server {0x2acfdbb59740} NOTE: logging initialized[15], logging_mode = 3
[Aug 15 18:09:54.833] Server {0x2acfdbb59740} WARNING: Access logging to local log directory suspended - partition space is low.
[Aug 15 18:09:54.868] Server {0x2acfdbb59740} NOTE: traffic server running
[Aug 15 18:10:10.729] Server {0x2acfe1f1e700} NOTE: cache enabled
FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
....
{code}



  was:
In the main function of traffic_manager, *webIntr_main* thread is created after euid has been changed to "proxy.config.admin.user_id".

And then, webIntr_main thread will create mgmtapisocket/eventapisocket socket fd, and listen on them.

But unfortunately, after created webIntr_main thread, the main thread will call listenForProxy()->bindProxyPort() immediately, which might change/restore *euid* concurrently.

For example:
1) bindProxyPort() change the euid to 0(root uid).
2) webIntr_main creates mgmtapisocket with 0 euid.
3) bindProxyPort() restore the euid to 501(admin.user_id)
4) webIntr_main calls chmod(0777) on mgmtapisocket, it'll get "No Permission" error.

As the api sockets can't create correctly, traffic_cop will failed to establish heartbeat with traffic_manager, and than traffic_cop will kill traffic_manager automatically.

But more worse, after killed and restarted manager, traffic_cop forgets to reconnect to manager, so it will use bad main_socket_fd to detect heartbeat of manager. Of course, it'll failed and than kill/restart manager again and again and again:

{code}
[E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
[Aug 15 18:09:20.730] {0x7ffb873aa7e0} STATUS: opened /var/log/trafficserver/manager.log
[Aug 15 18:09:20.730] {0x7ffb873aa7e0} NOTE: updated diags config
[Aug 15 18:09:20.732] Manager {0x7ffb873aa7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
[Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [newUNIXsocket] Unable to chmod unix-domain socket: Operation not permitted
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
[Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [WebIntrMain] Unable to set up so for handling managment API event calls. Event Socket path: /var/run/tr>
[Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [TrafficManager] Setup complete
[Aug 15 18:09:21.751] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::startProxy] Launching ts process
[TrafficServer] using root directory '/usr'
[Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '13'
[Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [Alarms::signalAlarm] Server Process born
[Aug 15 18:09:22.775] {0x2b4218293740} STATUS: opened /var/log/trafficserver/diags.log
[Aug 15 18:09:23.171] {0x2b4218293740} NOTE: updated diags config
[Aug 15 18:09:23.175] Server {0x2b4218293740} NOTE: cache clustering disabled
[Aug 15 18:09:23.176] Server {0x2b4218293740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
[Aug 15 18:09:23.202] Server {0x2b4218293740} NOTE: cache clustering disabled
[Aug 15 18:09:23.202] Server {0x2b4218293740} WARNING: unable to open cache disk(s): SSD Cache Disabled
[Aug 15 18:09:23.214] Server {0x2b4218293740} NOTE: logging initialized[15], logging_mode = 3
[Aug 15 18:09:23.215] Server {0x2b4218293740} WARNING: Access logging to local log directory suspended - partition space is low.
[Aug 15 18:09:23.248] Server {0x2b4218293740} NOTE: traffic server running
[Aug 15 18:09:39.200] Server {0x2b421a620700} NOTE: cache enabled
FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
[E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
[Aug 15 18:09:52.744] {0x7f918588f7e0} STATUS: opened /var/log/trafficserver/manager.log
[Aug 15 18:09:52.744] {0x7f918588f7e0} NOTE: updated diags config
[Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
[Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
[Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
[Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
[Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [TrafficManager] Setup complete
[Aug 15 18:09:53.765] Manager {0x7f918588f7e0} NOTE: [LocalManager::startProxy] Launching ts process
[TrafficServer] using root directory '/usr'
[Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '12'
[Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [Alarms::signalAlarm] Server Process born
[Aug 15 18:09:54.788] {0x2acfdbb59740} STATUS: opened /var/log/trafficserver/diags.log
[Aug 15 18:09:54.789] {0x2acfdbb59740} NOTE: updated diags config
[Aug 15 18:09:54.792] Server {0x2acfdbb59740} NOTE: cache clustering disabled
[Aug 15 18:09:54.794] Server {0x2acfdbb59740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
[Aug 15 18:09:54.820] Server {0x2acfdbb59740} NOTE: cache clustering disabled
[Aug 15 18:09:54.820] Server {0x2acfdbb59740} WARNING: unable to open cache disk(s): SSD Cache Disabled
[Aug 15 18:09:54.833] Server {0x2acfdbb59740} NOTE: logging initialized[15], logging_mode = 3
[Aug 15 18:09:54.833] Server {0x2acfdbb59740} WARNING: Access logging to local log directory suspended - partition space is low.
[Aug 15 18:09:54.868] Server {0x2acfdbb59740} NOTE: traffic server running
[Aug 15 18:10:10.729] Server {0x2acfe1f1e700} NOTE: cache enabled
FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
....
{code}



    
> Inconsistent euid cause bad mgmtapisocket/eventapisocket
> --------------------------------------------------------
>
>                 Key: TS-2141
>                 URL: https://issues.apache.org/jira/browse/TS-2141
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Management, Management API
>            Reporter: Yunkai Zhang
>
> In the main function of traffic_manager, *webIntr_main* thread is created after euid has been changed to "proxy.config.admin.user_id".
> And then, webIntr_main thread will create mgmtapisocket/eventapisocket socket fd, and listen on them.
> But unfortunately, after created webIntr_main thread, the main thread will call listenForProxy()->bindProxyPort() immediately, which might change/restore *euid* concurrently.
> For example:
> 1) bindProxyPort() change the euid to 0(root uid).
> 2) webIntr_main creates mgmtapisocket with 0 euid.
> 3) bindProxyPort() restore the euid to 501(admin.user_id)
> 4) webIntr_main calls chmod(0777) on mgmtapisocket, it'll get "Operation not permitted" error and other unexpected errors.
> As the api sockets can't create correctly, traffic_cop will failed to establish heartbeat with traffic_manager, and than traffic_cop will kill traffic_manager automatically.
> But more worse, after killed and restarted manager, traffic_cop forgets to reconnect to manager, so it will use bad main_socket_fd to detect heartbeat of manager. Of course, it'll failed and than kill/restart manager again and again and again:
> {code}
> [E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
> [Aug 15 18:09:20.730] {0x7ffb873aa7e0} STATUS: opened /var/log/trafficserver/manager.log
> [Aug 15 18:09:20.730] {0x7ffb873aa7e0} NOTE: updated diags config
> [Aug 15 18:09:20.732] Manager {0x7ffb873aa7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
> [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
> [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
> [Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [newUNIXsocket] Unable to chmod unix-domain socket: Operation not permitted
> [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
> [Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [WebIntrMain] Unable to set up so for handling managment API event calls. Event Socket path: /var/run/tr>
> [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [TrafficManager] Setup complete
> [Aug 15 18:09:21.751] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::startProxy] Launching ts process
> [TrafficServer] using root directory '/usr'
> [Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '13'
> [Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [Alarms::signalAlarm] Server Process born
> [Aug 15 18:09:22.775] {0x2b4218293740} STATUS: opened /var/log/trafficserver/diags.log
> [Aug 15 18:09:23.171] {0x2b4218293740} NOTE: updated diags config
> [Aug 15 18:09:23.175] Server {0x2b4218293740} NOTE: cache clustering disabled
> [Aug 15 18:09:23.176] Server {0x2b4218293740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
> [Aug 15 18:09:23.202] Server {0x2b4218293740} NOTE: cache clustering disabled
> [Aug 15 18:09:23.202] Server {0x2b4218293740} WARNING: unable to open cache disk(s): SSD Cache Disabled
> [Aug 15 18:09:23.214] Server {0x2b4218293740} NOTE: logging initialized[15], logging_mode = 3
> [Aug 15 18:09:23.215] Server {0x2b4218293740} WARNING: Access logging to local log directory suspended - partition space is low.
> [Aug 15 18:09:23.248] Server {0x2b4218293740} NOTE: traffic server running
> [Aug 15 18:09:39.200] Server {0x2b421a620700} NOTE: cache enabled
> FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
> [E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
> [Aug 15 18:09:52.744] {0x7f918588f7e0} STATUS: opened /var/log/trafficserver/manager.log
> [Aug 15 18:09:52.744] {0x7f918588f7e0} NOTE: updated diags config
> [Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
> [Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
> [Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
> [Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
> [Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [TrafficManager] Setup complete
> [Aug 15 18:09:53.765] Manager {0x7f918588f7e0} NOTE: [LocalManager::startProxy] Launching ts process
> [TrafficServer] using root directory '/usr'
> [Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '12'
> [Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [Alarms::signalAlarm] Server Process born
> [Aug 15 18:09:54.788] {0x2acfdbb59740} STATUS: opened /var/log/trafficserver/diags.log
> [Aug 15 18:09:54.789] {0x2acfdbb59740} NOTE: updated diags config
> [Aug 15 18:09:54.792] Server {0x2acfdbb59740} NOTE: cache clustering disabled
> [Aug 15 18:09:54.794] Server {0x2acfdbb59740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
> [Aug 15 18:09:54.820] Server {0x2acfdbb59740} NOTE: cache clustering disabled
> [Aug 15 18:09:54.820] Server {0x2acfdbb59740} WARNING: unable to open cache disk(s): SSD Cache Disabled
> [Aug 15 18:09:54.833] Server {0x2acfdbb59740} NOTE: logging initialized[15], logging_mode = 3
> [Aug 15 18:09:54.833] Server {0x2acfdbb59740} WARNING: Access logging to local log directory suspended - partition space is low.
> [Aug 15 18:09:54.868] Server {0x2acfdbb59740} NOTE: traffic server running
> [Aug 15 18:10:10.729] Server {0x2acfe1f1e700} NOTE: cache enabled
> FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
> ....
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira