You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Sateesh Chodapuneedi (JIRA)" <ji...@apache.org> on 2014/07/14 16:18:05 UTC

[jira] [Comment Edited] (CLOUDSTACK-7012) [Atomation] Vcenter Hang during 4.4 automation runs

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060669#comment-14060669 ] 

Sateesh Chodapuneedi edited comment on CLOUDSTACK-7012 at 7/14/14 2:17 PM:
---------------------------------------------------------------------------

This appears to be a problem with Windows 2008 (10.223.52.12 is running Windows 2008) TCP stack.
Following KB article elaborates the problem and solution (Windows hotfix).
http://kb.vmware.com/kb/2033822
Looked into the TCP connections in vCenter server and observed many connections from loopback address port 8089 to other loopback addresses as described in the above KB article.
This scenario could be observed under heavy load conditions.
================
vCenter Server returns 503 Service Unavailable errors (2033822)
Details

This problem affects only vCenter Servers running on Windows Vista or Windows Server 2008. Under a heavy load, some of the operations invoked on the vCenter Server fail and the error description indicates the HTTP error 503 Service Temporarily Unavailable. The exact error message might differ depending on the client, because it is generated by the client.

The vpxd log files contain entries that indicate that a socket connection attempt failed because it timed out. If you run netstat -an on the vCenter Server host machine immediately after the error, you will see many connections where one end is port 8085 on the loopback and the other end is another port on the loopback. Some of these connections will be in the TIME_WAIT state.

vCenter Server uses TCP connections on the loopback (localhost) for Remote Procedure Calls (RPC) to dispatch client requests and to communicate with vCenter Server companion services. As a result, under heavy loads, vCenter Server creates many local TCP connections, then closes them and opens new ones. Some of the closed connections remain open at the server side in the TIME_WAIT state for some time (four minutes with default Windows settings). Because the number of client-side ports is limited, if vCenter Server uses the connections fast enough, at some point the client side tries to reuse a port while the server side still has a connection for this client port in the TIME_WAIT state.
Normally, this situation should prompt the server to close the old connection and accept the new one. But on Windows Vista or Windows Server 2008, a documented flaw in the TCP stack might instead cause the server side to ignore the connection request. If this happens, the client retries several times and then times out. In this case, the vCenter Server dispatcher fails to contact the service and returns a 503 Service Unavailable error to the client, and the client request fails. 


was (Author: sateeshc):
This appears to be a problem with Windows 2008 (10.223.52.12 is running Windows 2008) TCP stack.
Following KB article elaborates the problem and solution (Windows hotfix).
http://kb.vmware.com/kb/2033822
================
vCenter Server returns 503 Service Unavailable errors (2033822)
Details

This problem affects only vCenter Servers running on Windows Vista or Windows Server 2008. Under a heavy load, some of the operations invoked on the vCenter Server fail and the error description indicates the HTTP error 503 Service Temporarily Unavailable. The exact error message might differ depending on the client, because it is generated by the client.

The vpxd log files contain entries that indicate that a socket connection attempt failed because it timed out. If you run netstat -an on the vCenter Server host machine immediately after the error, you will see many connections where one end is port 8085 on the loopback and the other end is another port on the loopback. Some of these connections will be in the TIME_WAIT state.

vCenter Server uses TCP connections on the loopback (localhost) for Remote Procedure Calls (RPC) to dispatch client requests and to communicate with vCenter Server companion services. As a result, under heavy loads, vCenter Server creates many local TCP connections, then closes them and opens new ones. Some of the closed connections remain open at the server side in the TIME_WAIT state for some time (four minutes with default Windows settings). Because the number of client-side ports is limited, if vCenter Server uses the connections fast enough, at some point the client side tries to reuse a port while the server side still has a connection for this client port in the TIME_WAIT state.
Normally, this situation should prompt the server to close the old connection and accept the new one. But on Windows Vista or Windows Server 2008, a documented flaw in the TCP stack might instead cause the server side to ignore the connection request. If this happens, the client retries several times and then times out. In this case, the vCenter Server dispatcher fails to contact the service and returns a 503 Service Unavailable error to the client, and the client request fails. 

> [Atomation] Vcenter Hang during 4.4 automation runs
> ---------------------------------------------------
>
>                 Key: CLOUDSTACK-7012
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7012
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: VMware
>    Affects Versions: 4.4.0
>         Environment: VCenter 5.0
> Exi 5.0
>            Reporter: Rayees Namathponnan
>            Assignee: Sateesh Chodapuneedi
>            Priority: Critical
>             Fix For: 4.4.0
>
>         Attachments: catalina.rar
>
>
> This issue observed with 4.4 automation run with Vcenter 5.0,  during BVT run VM deployment fail,  if you try to connect VCenter from console gets below error 
> Call "ServiceInstance.RetrieveContent" for object "ServiceInstance" on Server "10.223.52.12" failed.
> I was using same VCenter more 1.5 year, i never faced this issue earlier, then i tried to run automation with 4.2 and 4.3 build last week i didnt observe this issue,  i think some of the changes in CS 4.4 causing VCenter to hang 
> Observed below error in MS Log
> INFO  [c.c.h.v.u.VmwareContext] (DirectAgentCronJob-455:ctx-71dce779) New VmwareContext object, current outstanding count: 451
> INFO  [c.c.h.v.r.VmwareResource] (DirectAgentCronJob-455:ctx-71dce779) Scan hung worker VM to recycle
> INFO  [c.c.h.v.u.VmwareContext] (DirectAgent-17:ctx-d197a13b 10.223.250.131) New VmwareContext object, current outstanding count: 452



--
This message was sent by Atlassian JIRA
(v6.2#6252)