You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "José Correia (Jira)" <ji...@apache.org> on 2022/04/04 13:18:00 UTC

[jira] [Commented] (SLING-11181) Emit metrics that distinguish transient and permanent distribution failures

    [ https://issues.apache.org/jira/browse/SLING-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516809#comment-17516809 ] 

José Correia commented on SLING-11181:
--------------------------------------

The PR implementing these changes: [PR-105|https://github.com/apache/sling-org-apache-sling-distribution-journal/pull/105] 

[~marett] can you take a look whenever possible? 🙏

> Emit metrics that distinguish transient and permanent distribution failures
> ---------------------------------------------------------------------------
>
>                 Key: SLING-11181
>                 URL: https://issues.apache.org/jira/browse/SLING-11181
>             Project: Sling
>          Issue Type: Improvement
>          Components: Content Distribution
>            Reporter: José Correia
>            Priority: Major
>
> h3. Context
> Currently, our error metrics don't distinguish between distribution failures that are permanent and will fail even if retried, or failures that succeed after being retried.
> We want to improve this in order to be able to differentiate both scenarios.
> h3. Solution
> Failure metric should be labeled by:
>  * {{Transient failure}}
>  * {{Permanent failure}}
> h3. Proposed approach
> We can distinguish both these scenarios by using the following rationale:
>  * Transient failures happen whenever a package is distributed successfully but had more than 1 attempt at being distributed: {{retries > 0}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)