You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2019/05/09 11:45:00 UTC

[jira] [Commented] (SLING-8408) DistributionQueueHealthCheck should deal with failing queries

    [ https://issues.apache.org/jira/browse/SLING-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836298#comment-16836298 ] 

Thomas Mueller commented on SLING-8408:
---------------------------------------

Patch for sling-org-apache-sling-distribution-cor:

{noformat}
diff --git a/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java b/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java
index 38bf41e..caffc0d 100644
--- a/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java
+++ b/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java
@@ -124,8 +124,9 @@ public class DistributionQueueHealthCheck implements HealthCheck {
                         } else {
                             resultLog.debug("No items in queue [{}]", q.getName());
                         }
-
-                    } catch (Exception e) {
+                    } catch (IllegalStateException e) {
+                           resultLog.healthCheckError("The job index is not available (just yet) while inspecting replication agent [{}]", queueName);
+                       } catch (Exception e) {
                         resultLog.warn("Exception while inspecting distribution queue [{}]: {}", queueName, e);
                     }
                 }
{noformat}

* Catching IllegalStateException as that's what is thrown by SLING-8407 for the case where no index is available.
* Report this as a health check error: it means the index is not available, which can happen at the very first startup, or it could happen later on, if someone would remove the index. In both cases, the system is not in a good state, so reporting an error is appropriate. I would expect nobody monitors the health checks during the very first startup (where the repository is initialized), but I argue during that time the system is in fact not available.


> DistributionQueueHealthCheck should deal with failing queries
> -------------------------------------------------------------
>
>                 Key: SLING-8408
>                 URL: https://issues.apache.org/jira/browse/SLING-8408
>             Project: Sling
>          Issue Type: Improvement
>          Components: Content Distribution
>            Reporter: Thomas Mueller
>            Priority: Major
>
> The following health check indirectly runs a queries which might fail:
>  * [DistributionQueueHealthCheck|https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/monitor/DistributionQueueHealthCheck.java]: sling-org-apache-sling-distribution-core/src/main/java/org/apache/sling/distribution/monitor
> The call [JobManagerImpl.findJobs|https://github.com/apache/sling-org-apache-sling-event/blob/master/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L373], which can throw an exception with SLING-8407, if the index is not yet available. The health checks should catch this exception and return HEALTH_CHECK_ERROR for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)