You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by GitBox <gi...@apache.org> on 2021/12/16 09:20:06 UTC

[GitHub] [cloudstack] ravening opened a new pull request #5783: 4.16 kvm storage issues

ravening opened a new pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783


   ### Description
   
   This PR provides multiple options to handle storage issue on kvm.
   
   Ported from https://github.com/apache/cloudstack/pull/4708
   
   <!--- Describe your changes in DETAIL - And how has behaviour functionally changed. -->
   
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   <!--- ********************************************************************************* -->
   <!--- NOTE: AUTOMATATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE DOCUMENTATION. -->
   <!--- PLEASE PUT AN 'X' in only **ONE** box -->
   <!--- ********************************************************************************* -->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [ ] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [ ] Minor
   - [ ] Trivial
   
   
   ### Screenshots (if appropriate):
   
   
   ### How Has This Been Tested?
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to -->
   <!-- see how your change affects other areas of the code, etc. -->
   
   
   <!-- Please read the [CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) document -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] DaanHoogland commented on a change in pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
DaanHoogland commented on a change in pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#discussion_r771279402



##########
File path: plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
##########
@@ -1311,6 +1316,26 @@ public boolean configureHostParams(final Map<String, String> params) {
         return true;
     }
 
+    public void configureHeartBeatParams(final Map<String, String> params) {
+        Long heartBeatUpdateMaxRetries = null;
+        Long heartBeatUpdateRetrySleep = null;
+        Long heartBeatUpdateTimeout = null;
+        KVMHAMonitor.HeartBeatAction heartBeatFailureAction = null;
+        if (params.get(KVM_HEARTBEAT_UPDATE_MAX_RETRIES) != null) {
+            heartBeatUpdateMaxRetries = Long.parseLong(params.get(KVM_HEARTBEAT_UPDATE_MAX_RETRIES));
+        }
+        if (params.get(KVM_HEARTBEAT_UPDATE_RETRY_SLEEP) != null) {
+            heartBeatUpdateRetrySleep = Long.parseLong(params.get(KVM_HEARTBEAT_UPDATE_RETRY_SLEEP));
+        }
+        if (params.get(KVM_HEARTBEAT_UPDATE_TIMEOUT) != null) {
+            heartBeatUpdateTimeout = Long.parseLong(params.get(KVM_HEARTBEAT_UPDATE_TIMEOUT));
+        }
+        if (params.get(KVM_HEARTBEAT_FAILURE_ACTION) != null) {
+            heartBeatFailureAction = KVMHAMonitor.HeartBeatAction.valueOf(params.get(KVM_HEARTBEAT_FAILURE_ACTION).toUpperCase());
+        }

Review comment:
       never mind, not fully awake




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] DaanHoogland commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
DaanHoogland commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996058246


   @ravening sounds like this could go on 4.16. Would you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996448632


   Packaging result: :heavy_check_mark: el7 :heavy_multiplication_x: el8 :heavy_check_mark: debian :heavy_check_mark: suse15. SL-JID 1930


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] sureshanaparti commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
sureshanaparti commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996422774


   @blueorangutan package


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] sureshanaparti commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
sureshanaparti commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996652606


   @blueorangutan package


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996681512


   @sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-997062430


   <b>Trillian test result (tid-2678)</b>
   Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
   Total time taken: 34214 seconds
   Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5783-t2678-kvm-centos7.zip
   Smoke tests completed. 90 look OK, 1 have errors
   Only failed tests results shown below:
   
   
   Test | Result | Time (s) | Test File
   --- | --- | --- | ---
   test_hostha_enable_ha_when_host_disabled | `Error` | 4.66 | test_hostha_kvm.py
   test_hostha_enable_ha_when_host_in_maintenance | `Error` | 302.80 | test_hostha_kvm.py
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] ravening commented on a change in pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
ravening commented on a change in pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#discussion_r774396257



##########
File path: agent/src/main/java/com/cloud/agent/properties/AgentProperties.java
##########
@@ -39,14 +39,6 @@
      */
     public static final Property<Integer> VM_MIGRATE_DOMAIN_RETRIEVE_TIMEOUT = new Property<Integer>("vm.migrate.domain.retrieve.timeout", 10);
 
-    /**
-     * Reboot host and alert management on heartbeat timeout. <br>
-     * Data type: boolean.<br>
-     * Default value: true.
-     */
-    public static final Property<Boolean> REBOOT_HOST_AND_ALERT_MANAGEMENT_ON_HEARTBEAT_TIMEOUT
-        = new Property<Boolean>("reboot.host.and.alert.management.on.heartbeat.timeout", true);

Review comment:
       The default action is to reboot the system using the -c flag. so i will just remove it from agent.properties file




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] DaanHoogland commented on a change in pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
DaanHoogland commented on a change in pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#discussion_r771279093



##########
File path: plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
##########
@@ -1311,6 +1316,26 @@ public boolean configureHostParams(final Map<String, String> params) {
         return true;
     }
 
+    public void configureHeartBeatParams(final Map<String, String> params) {
+        Long heartBeatUpdateMaxRetries = null;
+        Long heartBeatUpdateRetrySleep = null;
+        Long heartBeatUpdateTimeout = null;
+        KVMHAMonitor.HeartBeatAction heartBeatFailureAction = null;
+        if (params.get(KVM_HEARTBEAT_UPDATE_MAX_RETRIES) != null) {
+            heartBeatUpdateMaxRetries = Long.parseLong(params.get(KVM_HEARTBEAT_UPDATE_MAX_RETRIES));
+        }
+        if (params.get(KVM_HEARTBEAT_UPDATE_RETRY_SLEEP) != null) {
+            heartBeatUpdateRetrySleep = Long.parseLong(params.get(KVM_HEARTBEAT_UPDATE_RETRY_SLEEP));
+        }
+        if (params.get(KVM_HEARTBEAT_UPDATE_TIMEOUT) != null) {
+            heartBeatUpdateTimeout = Long.parseLong(params.get(KVM_HEARTBEAT_UPDATE_TIMEOUT));
+        }
+        if (params.get(KVM_HEARTBEAT_FAILURE_ACTION) != null) {
+            heartBeatFailureAction = KVMHAMonitor.HeartBeatAction.valueOf(params.get(KVM_HEARTBEAT_FAILURE_ACTION).toUpperCase());
+        }

Review comment:
       could we use a `switch` statement here (or `else if`)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] sureshanaparti commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
sureshanaparti commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996681256


   @blueorangutan test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] ravening commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
ravening commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996077473


   > @ravening sounds like this could go on 4.16. Would you?
   
   @DaanHoogland it's a new feature. So it will goto main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996652846


   @sureshanaparti a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] GutoVeronezi commented on a change in pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
GutoVeronezi commented on a change in pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#discussion_r773186546



##########
File path: server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java
##########
@@ -459,6 +463,20 @@
     public static final ConfigKey<Boolean> MIGRATE_VM_ACROSS_CLUSTERS = new ConfigKey<Boolean>(Boolean.class, "migrate.vm.across.clusters", "Advanced", "false",
             "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",true, ConfigKey.Scope.Zone, null);
 
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_MAX_RETRIES_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_MAX_RETRIES, "5",
+            "The maximum retries of kvm heartbeat to write to storage",

Review comment:
       ```suggestion
               "The maximum retries of KVM heartbeat to write to storage.",
   ```

##########
File path: server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java
##########
@@ -459,6 +463,20 @@
     public static final ConfigKey<Boolean> MIGRATE_VM_ACROSS_CLUSTERS = new ConfigKey<Boolean>(Boolean.class, "migrate.vm.across.clusters", "Advanced", "false",
             "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",true, ConfigKey.Scope.Zone, null);
 
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_MAX_RETRIES_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_MAX_RETRIES, "5",
+            "The maximum retries of kvm heartbeat to write to storage",
+            true, ConfigKey.Scope.Global);
+
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_RETRY_SLEEP_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_RETRY_SLEEP, "10000",
+            "The sleep time, in milliseconds, between two kvm heartbeats to write to storage",
+            true, ConfigKey.Scope.Global);
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_TIMEOUT_CK = new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_TIMEOUT, "60000",
+            "Timeout(in milliseconds) that kvm heartbeat to write to storage",
+            true, ConfigKey.Scope.Global);
+    public static final ConfigKey<String> KVM_HEARTBEAT_FAILURE_ACTION_CK = new ConfigKey<>("Advanced", String.class, KVM_HEARTBEAT_FAILURE_ACTION, "hardreset",
+            "The action for heartbeat write failures on KVM host. The valid value are 'hardreset' (default), 'stopagent', 'destroyvms'",

Review comment:
       ```suggestion
               "The action for heartbeat write failures on KVM host. The valid values are 'hardreset' (default), 'stopagent', 'destroyvms'",
   ```

##########
File path: server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java
##########
@@ -459,6 +463,20 @@
     public static final ConfigKey<Boolean> MIGRATE_VM_ACROSS_CLUSTERS = new ConfigKey<Boolean>(Boolean.class, "migrate.vm.across.clusters", "Advanced", "false",
             "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",true, ConfigKey.Scope.Zone, null);
 
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_MAX_RETRIES_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_MAX_RETRIES, "5",
+            "The maximum retries of kvm heartbeat to write to storage",
+            true, ConfigKey.Scope.Global);
+
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_RETRY_SLEEP_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_RETRY_SLEEP, "10000",
+            "The sleep time, in milliseconds, between two kvm heartbeats to write to storage",

Review comment:
       ```suggestion
               "The sleep time, in milliseconds, between two KVM heartbeats to write to storage.",
   ```

##########
File path: server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java
##########
@@ -459,6 +463,20 @@
     public static final ConfigKey<Boolean> MIGRATE_VM_ACROSS_CLUSTERS = new ConfigKey<Boolean>(Boolean.class, "migrate.vm.across.clusters", "Advanced", "false",
             "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",true, ConfigKey.Scope.Zone, null);
 
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_MAX_RETRIES_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_MAX_RETRIES, "5",
+            "The maximum retries of kvm heartbeat to write to storage",
+            true, ConfigKey.Scope.Global);
+
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_RETRY_SLEEP_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_RETRY_SLEEP, "10000",
+            "The sleep time, in milliseconds, between two kvm heartbeats to write to storage",
+            true, ConfigKey.Scope.Global);
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_TIMEOUT_CK = new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_TIMEOUT, "60000",
+            "Timeout(in milliseconds) that kvm heartbeat to write to storage",
+            true, ConfigKey.Scope.Global);
+    public static final ConfigKey<String> KVM_HEARTBEAT_FAILURE_ACTION_CK = new ConfigKey<>("Advanced", String.class, KVM_HEARTBEAT_FAILURE_ACTION, "hardreset",
+            "The action for heartbeat write failures on KVM host. The valid value are 'hardreset' (default), 'stopagent', 'destroyvms'",

Review comment:
       The value `noaction` should be explicit here too.

##########
File path: server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java
##########
@@ -459,6 +463,20 @@
     public static final ConfigKey<Boolean> MIGRATE_VM_ACROSS_CLUSTERS = new ConfigKey<Boolean>(Boolean.class, "migrate.vm.across.clusters", "Advanced", "false",
             "Indicates whether the VM can be migrated to different cluster if no host is found in same cluster",true, ConfigKey.Scope.Zone, null);
 
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_MAX_RETRIES_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_MAX_RETRIES, "5",
+            "The maximum retries of kvm heartbeat to write to storage",
+            true, ConfigKey.Scope.Global);
+
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_RETRY_SLEEP_CK =  new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_RETRY_SLEEP, "10000",
+            "The sleep time, in milliseconds, between two kvm heartbeats to write to storage",
+            true, ConfigKey.Scope.Global);
+    public static final ConfigKey<Long> KVM_HEARTBEAT_UPDATE_TIMEOUT_CK = new ConfigKey<>("Advanced", Long.class, KVM_HEARTBEAT_UPDATE_TIMEOUT, "60000",
+            "Timeout(in milliseconds) that kvm heartbeat to write to storage",

Review comment:
       ```suggestion
               "Timeout, in milliseconds, to KVM heartbeat writes to storage.",
   ```

##########
File path: engine/orchestration/src/main/java/com/cloud/agent/manager/AgentManagerImpl.java
##########
@@ -1840,4 +1878,91 @@ public void propagateChangeToAgents(Map<String, String> params) {
             sendCommandToAgents(hostsPerZone, params);
         }
     }
+
+    protected class ScanDisconnectedHostsTask extends ManagedContextRunnable {
+
+        @Override
+        protected void runInContext() {
+            try {
+                ManagementServerHostVO msHost = _msHostDao.findOneInUpState(new Filter(ManagementServerHostVO.class, "id", true, 0L, 1L));
+                if (msHost == null || (msHost.getMsid() != _nodeId)) {
+                    s_logger.debug("Skipping disconnected hosts scan task");
+                    for (Long hostId : _investigateTasksMap.keySet()) {
+                        cancelInvestigationTask(hostId);
+                    }
+                    return;
+                }
+                for (HostVO host : _hostDao.listByType(Host.Type.Routing)) {
+                    if (host.getStatus() == Status.Disconnected) {
+                        scheduleInvestigationTask(host.getId());
+                    }
+                }
+            } catch (final Exception e) {
+                s_logger.error("Exception caught while scanning disconnected hosts : ", e);
+            }
+        }
+    }
+
+    protected class InvestigationTask extends ManagedContextRunnable {
+        Long _hostId;
+
+        InvestigationTask(final Long hostId) {
+            _hostId = hostId;
+        }
+
+        @Override
+        protected void runInContext() {
+            try {
+                final long hostId = _hostId;
+                s_logger.info("Investigating host " + hostId + " to determine its actual state");
+                HostVO host = _hostDao.findById(hostId);
+                if (host == null) {
+                    s_logger.info("Cancelling investigation on host " + hostId + " which might has been removed");
+                    cancelInvestigationTask(hostId);
+                    return;
+                }
+                if (host.getStatus() != Status.Disconnected) {
+                    s_logger.info("Cancelling investigation on host " + hostId + " in status " + host.getStatus());
+                    cancelInvestigationTask(hostId);
+                    return;
+                }
+                Status determinedState = _haMgr.investigate(hostId);
+                s_logger.info("Investigators determine the status of host " + hostId + " is " + determinedState);
+                if (determinedState == Status.Down) {
+                    agentStatusTransitTo(host, Status.Event.HostDown, _nodeId);
+                    s_logger.info("Scheduling VMs restart on host " + hostId + " which is Down");
+                    _haMgr.scheduleRestartForVmsOnHost(host, true);
+                    s_logger.info("Cancelling investigation on host " + hostId + " which is Down");
+                    cancelInvestigationTask(hostId);
+                }
+            } catch (final Exception e) {

Review comment:
       Do we need a catch Pokémon (`catch them all`) here?

##########
File path: engine/orchestration/src/main/java/com/cloud/agent/manager/AgentManagerImpl.java
##########
@@ -1840,4 +1878,91 @@ public void propagateChangeToAgents(Map<String, String> params) {
             sendCommandToAgents(hostsPerZone, params);
         }
     }
+
+    protected class ScanDisconnectedHostsTask extends ManagedContextRunnable {
+
+        @Override
+        protected void runInContext() {
+            try {
+                ManagementServerHostVO msHost = _msHostDao.findOneInUpState(new Filter(ManagementServerHostVO.class, "id", true, 0L, 1L));
+                if (msHost == null || (msHost.getMsid() != _nodeId)) {
+                    s_logger.debug("Skipping disconnected hosts scan task");
+                    for (Long hostId : _investigateTasksMap.keySet()) {
+                        cancelInvestigationTask(hostId);
+                    }
+                    return;
+                }
+                for (HostVO host : _hostDao.listByType(Host.Type.Routing)) {
+                    if (host.getStatus() == Status.Disconnected) {
+                        scheduleInvestigationTask(host.getId());
+                    }
+                }
+            } catch (final Exception e) {
+                s_logger.error("Exception caught while scanning disconnected hosts : ", e);
+            }
+        }
+    }
+
+    protected class InvestigationTask extends ManagedContextRunnable {
+        Long _hostId;
+
+        InvestigationTask(final Long hostId) {
+            _hostId = hostId;
+        }
+
+        @Override
+        protected void runInContext() {
+            try {
+                final long hostId = _hostId;
+                s_logger.info("Investigating host " + hostId + " to determine its actual state");
+                HostVO host = _hostDao.findById(hostId);
+                if (host == null) {
+                    s_logger.info("Cancelling investigation on host " + hostId + " which might has been removed");
+                    cancelInvestigationTask(hostId);
+                    return;
+                }
+                if (host.getStatus() != Status.Disconnected) {
+                    s_logger.info("Cancelling investigation on host " + hostId + " in status " + host.getStatus());
+                    cancelInvestigationTask(hostId);
+                    return;
+                }
+                Status determinedState = _haMgr.investigate(hostId);
+                s_logger.info("Investigators determine the status of host " + hostId + " is " + determinedState);
+                if (determinedState == Status.Down) {
+                    agentStatusTransitTo(host, Status.Event.HostDown, _nodeId);
+                    s_logger.info("Scheduling VMs restart on host " + hostId + " which is Down");
+                    _haMgr.scheduleRestartForVmsOnHost(host, true);
+                    s_logger.info("Cancelling investigation on host " + hostId + " which is Down");
+                    cancelInvestigationTask(hostId);
+                }
+            } catch (final Exception e) {
+                s_logger.error("Exception caught while handling investigation task: ", e);
+            }
+        }
+    }
+
+    private void scheduleInvestigationTask(final Long hostId) {
+        ScheduledFuture future = _investigateTasksMap.get(hostId);
+        if (future != null) {
+            s_logger.info("There is already a task to investigate host " + hostId);
+        } else {
+            ScheduledFuture scheduledFuture = _investigatorExecutor.scheduleWithFixedDelay(new InvestigationTask(hostId), InvestigateDisconnectedHostsInterval.value(),
+                    InvestigateDisconnectedHostsInterval.value(), TimeUnit.SECONDS);
+            _investigateTasksMap.put(hostId, scheduledFuture);
+            s_logger.info("Scheduled a task to investigate host " + hostId);
+        }
+    }
+
+    private void cancelInvestigationTask(final Long hostId) {
+        ScheduledFuture future = _investigateTasksMap.get(hostId);
+        if (future != null) {
+            try {
+                future.cancel(false);
+                s_logger.info("Cancelled a task to investigate host " + hostId);
+                _investigateTasksMap.remove(hostId);
+            } catch (Exception e) {

Review comment:
       Do we need a catch Pokémon (`catch them all`) here?

##########
File path: engine/orchestration/src/main/java/com/cloud/agent/manager/AgentManagerImpl.java
##########
@@ -1840,4 +1878,91 @@ public void propagateChangeToAgents(Map<String, String> params) {
             sendCommandToAgents(hostsPerZone, params);
         }
     }
+
+    protected class ScanDisconnectedHostsTask extends ManagedContextRunnable {
+
+        @Override
+        protected void runInContext() {
+            try {
+                ManagementServerHostVO msHost = _msHostDao.findOneInUpState(new Filter(ManagementServerHostVO.class, "id", true, 0L, 1L));
+                if (msHost == null || (msHost.getMsid() != _nodeId)) {

Review comment:
       ```suggestion
                   if (msHost == null || msHost.getMsid() != _nodeId) {
   ```

##########
File path: agent/src/main/java/com/cloud/agent/properties/AgentProperties.java
##########
@@ -39,14 +39,6 @@
      */
     public static final Property<Integer> VM_MIGRATE_DOMAIN_RETRIEVE_TIMEOUT = new Property<Integer>("vm.migrate.domain.retrieve.timeout", 10);
 
-    /**
-     * Reboot host and alert management on heartbeat timeout. <br>
-     * Data type: boolean.<br>
-     * Default value: true.
-     */
-    public static final Property<Boolean> REBOOT_HOST_AND_ALERT_MANAGEMENT_ON_HEARTBEAT_TIMEOUT
-        = new Property<Boolean>("reboot.host.and.alert.management.on.heartbeat.timeout", true);

Review comment:
       If we gonna remove this property, we need to explicit the option in the new feature that behave the same way and advise operators on how to configure it.
   
   Also, we should remove from `agent.properties` too and wherever it is used.

##########
File path: engine/orchestration/src/main/java/com/cloud/agent/manager/AgentManagerImpl.java
##########
@@ -1840,4 +1878,91 @@ public void propagateChangeToAgents(Map<String, String> params) {
             sendCommandToAgents(hostsPerZone, params);
         }
     }
+
+    protected class ScanDisconnectedHostsTask extends ManagedContextRunnable {
+
+        @Override
+        protected void runInContext() {
+            try {
+                ManagementServerHostVO msHost = _msHostDao.findOneInUpState(new Filter(ManagementServerHostVO.class, "id", true, 0L, 1L));
+                if (msHost == null || (msHost.getMsid() != _nodeId)) {
+                    s_logger.debug("Skipping disconnected hosts scan task");
+                    for (Long hostId : _investigateTasksMap.keySet()) {
+                        cancelInvestigationTask(hostId);
+                    }
+                    return;
+                }
+                for (HostVO host : _hostDao.listByType(Host.Type.Routing)) {
+                    if (host.getStatus() == Status.Disconnected) {
+                        scheduleInvestigationTask(host.getId());
+                    }
+                }
+            } catch (final Exception e) {

Review comment:
       Do we need a catch Pokémon (`catch them all`) here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996423020


   @sureshanaparti a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [cloudstack] blueorangutan commented on pull request #5783: 4.16 kvm storage issues

Posted by GitBox <gi...@apache.org>.
blueorangutan commented on pull request #5783:
URL: https://github.com/apache/cloudstack/pull/5783#issuecomment-996678170


   Packaging result: :heavy_check_mark: el7 :heavy_check_mark: el8 :heavy_check_mark: debian :heavy_check_mark: suse15. SL-JID 1937


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org