You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2018/05/09 19:26:40 UTC

[GitHub] PaulAngus opened a new issue #2633: SSVM cannot reconnect after connection disruption if there is an active event.

PaulAngus opened a new issue #2633: SSVM cannot reconnect after connection disruption if there is an active event.
URL: https://github.com/apache/cloudstack/issues/2633
 
 
   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and master branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   System VMs
   (Maybe also KVM hosts)
   ~~~
   
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on master branch.
   -->
   ~~~
   4.11.0
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, advanced networking, etc.  N/A otherwise
   -->
   
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   4.11.0 environment with VMware 
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   
   If there is an interruption to mgmt server <-> agent communications while an action is taking place (such as the mgmt server restarting when the  ssvm is performing a snapshot) the SSVM will not be able to reconnect due to following error:
   2018-05-09 11:37:09,403 INFO  [cloud.agent.Agent] (Agent-Handler-9:null) Lost connection to host: 10.220.136.127. Dealing with the remaining commands...
   2018-05-09 11:37:09,404 INFO  [cloud.agent.Agent] (Agent-Handler-9:null) Cannot connect because we still have 1 commands in progress.
   
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   <!-- Paste example playbooks or commands between quotes below -->
   ~~~
   During a volume snapshot exporting the ovf restart the management server.  
   ~~~
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   ~~~
   SSVM reconnects.
   ~~~
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   <!-- Paste verbatim command output between quotes below -->
   ~~~
   The storage VM does not reconnect to the management server and has an error such as: 
   INFO  [cloud.agent.Agent] (Agent-Handler-9:null) Lost connection to host: 10.220.136.127. Dealing with the remaining commands...
   INFO  [cloud.agent.Agent] (Agent-Handler-9:null) Cannot connect because we still have 1 commands in progress.
   
   Once the job had finished it will reconnect but until this point all other jobs failed unless there is another secondary storage vm up and running.
   The backup job even though it is forced to complete from secondary storage is still left in the db as state backing up forever so it does not make that it even waiting for it to finish.
   ~~~
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services