You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cloudstack.apache.org by "kishankavala (via GitHub)" <gi...@apache.org> on 2023/12/21 06:24:42 UTC

[PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

kishankavala opened a new pull request, #8394:
URL: https://github.com/apache/cloudstack/pull/8394

   ### Description
   
   This PR fixes moves resources stuck in transition state during async job cleanup
   
   Problem:
   During maintenance of the management server, other servers in the cluster or the same server after a restart initiate async job cleanup. However, this process leaves resources in a transitional state. The only recovery option currently available is to make direct database changes.
   
   Solution:
   This PR introduces a resolution by changing  Volume, Virtual Machine, and Network resources from their transitional states. This adjustment enables the reattempt of failed operations without the need for manual database modifications.
   
   <!--- Describe your changes in DETAIL - And how has behaviour functionally changed. -->
   
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   <!--- ********************************************************************************* -->
   <!--- NOTE: AUTOMATATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE DOCUMENTATION. -->
   <!--- PLEASE PUT AN 'X' in only **ONE** box -->
   <!--- ********************************************************************************* -->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [X] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [X] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [X] Critical
   - [ ] Major
   - [ ] Minor
   - [ ] Trivial
   
   
   ### Screenshots (if appropriate):
   
   
   ### How Has This Been Tested?
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to -->
   <!-- see how your change affects other areas of the code, etc. -->
   Tested manually and with unit tests
   
   <!-- Please read the [CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) document -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "harikrishna-patnala (via GitHub)" <gi...@apache.org>.
harikrishna-patnala commented on code in PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#discussion_r1433619783


##########
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java:
##########
@@ -35,12 +35,22 @@
 import javax.inject.Inject;
 import javax.naming.ConfigurationException;
 
+import com.cloud.network.Network;

Review Comment:
   can you reorder this please ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1865809820

   Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8111


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "rohityadavcloud (via GitHub)" <gi...@apache.org>.
rohityadavcloud commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898322626

   @shwstppr can we get this in 4.19.0.0 ? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1878170889

   @sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with  KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1866413354

   @rohityadavcloud a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1867225679

   <b>[SF] Trillian test result (tid-8653)</b>
   Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
   Total time taken: 47435 seconds
   Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8394-t8653-kvm-centos7.zip
   Smoke tests completed. 126 look OK, 2 have errors, 0 did not run
   Only failed and skipped tests results shown below:
   
   
   Test | Result | Time (s) | Test File
   --- | --- | --- | ---
   test_05_list_volumes_isrecursive | `Failure` | 0.03 | test_list_volumes.py
   test_07_list_volumes_listall | `Failure` | 0.02 | test_list_volumes.py
   test_02_upgrade_kubernetes_cluster | `Failure` | 436.85 | test_kubernetes_clusters.py
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898667521

   @shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with  KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1880839498

   Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8228


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1880889761

   @sureshanaparti a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1882374847

   <b>[SF] Trillian test result (tid-8754)</b>
   Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
   Total time taken: 54871 seconds
   Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8394-t8754-kvm-centos7.zip
   Smoke tests completed. 121 look OK, 0 have errors, 0 did not run
   Only failed and skipped tests results shown below:
   
   
   Test | Result | Time (s) | Test File
   --- | --- | --- | ---
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "shwstppr (via GitHub)" <gi...@apache.org>.
shwstppr merged PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898400265

   @shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with  KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "shwstppr (via GitHub)" <gi...@apache.org>.
shwstppr commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898398643

   @blueorangutan package


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1899883178

   > [SF] Trillian test result (tid-8873) Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7 Total time taken: 48423 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8394-t8873-xenserver-71.zip Smoke tests completed. 120 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:
   > 
   > Test	Result	Time (s)	Test File
   > ContextSuite context=TestSharedNetwork>:setup	`Error`	64.62	test_network.py
   
   Error here while creating network, is not related to this PR changes. Changes here would reset/cleanup any Volume, VM & Network(in implementing state) resources for the pending jobs on MS start & is good to go. cc @shwstppr 
   
   `Execute cmd: createnetwork failed, due to: errorCode: 431, errorText:The VLAN tag to use for new guest network, 2625 is already being used for dynamic vlan allocation for the guest network in zone pr8394-t8873-xenserver-71`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1878169594

   @blueorangutan package


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898824088

   @shwstppr a [SL] Trillian-Jenkins matrix job (centos7 mgmt + xenserver71, rocky8 mgmt + vmware67u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "shwstppr (via GitHub)" <gi...@apache.org>.
shwstppr commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898823695

   @blueorangutan test matrix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "kiranchavala (via GitHub)" <gi...@apache.org>.
kiranchavala commented on code in PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#discussion_r1436812958


##########
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java:
##########
@@ -1128,6 +1139,65 @@ public void doInTransactionWithoutResult(TransactionStatus status) {
         }
     }
 
+    /*
+    Cleanup Resources in transition state and move them to appropriate state
+    This will allow other operation on the resource, instead of being stuck in transition state
+     */
+    protected boolean cleanupResources(AsyncJobVO job) {
+        try {
+            ApiCommandResourceType resourceType = ApiCommandResourceType.fromString(job.getInstanceType());
+            if (resourceType == null) {
+                s_logger.warn("Unknown ResourceType. Skip Cleanup: " + job.getInstanceType());
+                return true;
+            }
+            switch (resourceType) {
+                case Volume:
+                    VolumeInfo vol = volFactory.getVolume(job.getInstanceId());
+                    if (vol == null) {
+                        s_logger.warn("Volume not found. Skip Cleanup. VolumeId: " + job.getInstanceId());

Review Comment:
   @JoaoJandre 
   
   According to the design document 
   
   https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=39620237
   
   when a volume state is "UploadInProgress" and the process is interrupted by stopping the ssvm.
   
   The volume state changes to "NotUploaded" > UploadError"
   
   The admin user after investigating  the issue can be delete the volumes in "UploadError" state manually
   
   **Cleanup**
   
   A cleanup thread will be running at regular intervals (configurable, provide details). It will pick up all volume/template with upload state as "UPLOAD_ERROR" and "ABANDONED" and send agent command to cleanup any partial data from secondary store. The cleanup will be a best-effort approach.
   
   **Recovery mechanisms**
   
   There isn't any recovery or retry mechanism as this is a POST request. Once error happens user gets notified with a clear error message as part of the response. The template/volume will remain in the error state and admin will be able to troubleshoot it based on the appropriate log messages in management server log, agent log, apache access/error log files. These failed entries will eventually get be cleaned by the cleanup process. The user has to reinitiate the upload by calling getUploadParams API again.
   
   Global settings are
   
   Upload monitoring interval 
   Upload operation timeout
   
   
   
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1865616137

   ## [Codecov](https://app.codecov.io/gh/apache/cloudstack/pull/8394?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Comparison is base [(`1411da1`)](https://app.codecov.io/gh/apache/cloudstack/commit/1411da1a22bc6aa26634f3038475e3d5fbbcd6bb?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 30.88% compared to head [(`c4cb6be`)](https://app.codecov.io/gh/apache/cloudstack/pull/8394?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 4.39%.
   
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@             Coverage Diff              @@
   ##               main   #8394       +/-   ##
   ============================================
   - Coverage     30.88%   4.39%   -26.49%     
   ============================================
     Files          5341     361     -4980     
     Lines        374861   28622   -346239     
     Branches      54518    4992    -49526     
   ============================================
   - Hits         115769    1258   -114511     
   + Misses       243825   27225   -216600     
   + Partials      15267     139    -15128     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/cloudstack/pull/8394/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [simulator-marvin-tests](https://app.codecov.io/gh/apache/cloudstack/pull/8394/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [uitests](https://app.codecov.io/gh/apache/cloudstack/pull/8394/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `4.39% <ø> (ø)` | |
   | [unit-tests](https://app.codecov.io/gh/apache/cloudstack/pull/8394/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/cloudstack/pull/8394?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1880733699

   @blueorangutan package


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "JoaoJandre (via GitHub)" <gi...@apache.org>.
JoaoJandre commented on code in PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#discussion_r1436970078


##########
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java:
##########
@@ -1128,6 +1139,65 @@ public void doInTransactionWithoutResult(TransactionStatus status) {
         }
     }
 
+    /*
+    Cleanup Resources in transition state and move them to appropriate state
+    This will allow other operation on the resource, instead of being stuck in transition state
+     */
+    protected boolean cleanupResources(AsyncJobVO job) {
+        try {
+            ApiCommandResourceType resourceType = ApiCommandResourceType.fromString(job.getInstanceType());
+            if (resourceType == null) {
+                s_logger.warn("Unknown ResourceType. Skip Cleanup: " + job.getInstanceType());
+                return true;
+            }
+            switch (resourceType) {
+                case Volume:
+                    VolumeInfo vol = volFactory.getVolume(job.getInstanceId());
+                    if (vol == null) {
+                        s_logger.warn("Volume not found. Skip Cleanup. VolumeId: " + job.getInstanceId());

Review Comment:
   @kiranchavala thanks for the detailed answer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "harikrishna-patnala (via GitHub)" <gi...@apache.org>.
harikrishna-patnala commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1865646801

   @blueorangutan package


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "shwstppr (via GitHub)" <gi...@apache.org>.
shwstppr commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1866494948

   Moving this to 4.19.1 milestone for now cc @rohityadavcloud 
   If we are not able to cut RC this week and tests look good we can move it back and merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1865650945

   @harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with  KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "JoaoJandre (via GitHub)" <gi...@apache.org>.
JoaoJandre commented on code in PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#discussion_r1434482557


##########
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java:
##########
@@ -1128,6 +1139,65 @@ public void doInTransactionWithoutResult(TransactionStatus status) {
         }
     }
 
+    /*
+    Cleanup Resources in transition state and move them to appropriate state
+    This will allow other operation on the resource, instead of being stuck in transition state
+     */
+    protected boolean cleanupResources(AsyncJobVO job) {
+        try {
+            ApiCommandResourceType resourceType = ApiCommandResourceType.fromString(job.getInstanceType());
+            if (resourceType == null) {
+                s_logger.warn("Unknown ResourceType. Skip Cleanup: " + job.getInstanceType());
+                return true;
+            }
+            switch (resourceType) {
+                case Volume:
+                    VolumeInfo vol = volFactory.getVolume(job.getInstanceId());
+                    if (vol == null) {
+                        s_logger.warn("Volume not found. Skip Cleanup. VolumeId: " + job.getInstanceId());

Review Comment:
   When the volume is in state UploadInProgress (meaning the upload should have been started already),  shouldn't we also clean up whatever was uploaded? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1899837012

   <b>[SF] Trillian test result (tid-8873)</b>
   Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
   Total time taken: 48423 seconds
   Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8394-t8873-xenserver-71.zip
   Smoke tests completed. 120 look OK, 1 have errors, 0 did not run
   Only failed and skipped tests results shown below:
   
   
   Test | Result | Time (s) | Test File
   --- | --- | --- | ---
   ContextSuite context=TestSharedNetwork>:setup | `Error` | 64.62 | test_network.py
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "shwstppr (via GitHub)" <gi...@apache.org>.
shwstppr commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898398349

   @blueorangutan test matrix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1899755551

   <b>[SF] Trillian test result (tid-8875)</b>
   Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
   Total time taken: 41654 seconds
   Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8394-t8875-kvm-centos7.zip
   Smoke tests completed. 121 look OK, 0 have errors, 0 did not run
   Only failed and skipped tests results shown below:
   
   
   Test | Result | Time (s) | Test File
   --- | --- | --- | ---
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "kiranchavala (via GitHub)" <gi...@apache.org>.
kiranchavala commented on code in PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#discussion_r1436812958


##########
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java:
##########
@@ -1128,6 +1139,65 @@ public void doInTransactionWithoutResult(TransactionStatus status) {
         }
     }
 
+    /*
+    Cleanup Resources in transition state and move them to appropriate state
+    This will allow other operation on the resource, instead of being stuck in transition state
+     */
+    protected boolean cleanupResources(AsyncJobVO job) {
+        try {
+            ApiCommandResourceType resourceType = ApiCommandResourceType.fromString(job.getInstanceType());
+            if (resourceType == null) {
+                s_logger.warn("Unknown ResourceType. Skip Cleanup: " + job.getInstanceType());
+                return true;
+            }
+            switch (resourceType) {
+                case Volume:
+                    VolumeInfo vol = volFactory.getVolume(job.getInstanceId());
+                    if (vol == null) {
+                        s_logger.warn("Volume not found. Skip Cleanup. VolumeId: " + job.getInstanceId());

Review Comment:
   @JoaoJandre  @sureshanaparti 
   
   According to the design document 
   
   https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=39620237
   
   when a volume state is "UploadInProgress" and the process is interrupted by stopping the ssvm.
   
   The volume state changes to "NotUploaded" > UploadError"
   
   The admin user after investigating  the issue can be delete the volumes in "UploadError" state manually
   
   **Cleanup**
   
   A cleanup thread will be running at regular intervals (configurable, provide details). It will pick up all volume/template with upload state as "UPLOAD_ERROR" and "ABANDONED" and send agent command to cleanup any partial data from secondary store. The cleanup will be a best-effort approach.
   
   **Recovery mechanisms**
   
   There isn't any recovery or retry mechanism as this is a POST request. Once error happens user gets notified with a clear error message as part of the response. The template/volume will remain in the error state and admin will be able to troubleshoot it based on the appropriate log messages in management server log, agent log, apache access/error log files. These failed entries will eventually get be cleaned by the cleanup process. The user has to reinitiate the upload by calling getUploadParams API again.
   
   Global settings are
   
   Upload monitoring interval 
   Upload operation timeout
   
   
   
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1880887889

   @blueorangutan test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1878222939

   Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8205


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1880745023

   @sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with  KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "sureshanaparti (via GitHub)" <gi...@apache.org>.
sureshanaparti commented on code in PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#discussion_r1442521408


##########
framework/jobs/src/main/java/org/apache/cloudstack/framework/jobs/impl/AsyncJobManagerImpl.java:
##########
@@ -35,12 +35,22 @@
 import javax.inject.Inject;
 import javax.naming.ConfigurationException;
 
+import com.cloud.network.Network;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "shwstppr (via GitHub)" <gi...@apache.org>.
shwstppr commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898666353

   @blueorangutan package


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "blueorangutan (via GitHub)" <gi...@apache.org>.
blueorangutan commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1898796013

   Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] CleanUp Async Jobs after mgmt server maintenance [cloudstack]

Posted by "rohityadavcloud (via GitHub)" <gi...@apache.org>.
rohityadavcloud commented on PR #8394:
URL: https://github.com/apache/cloudstack/pull/8394#issuecomment-1866409771

   @blueorangutan test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org