You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2023/12/06 13:14:00 UTC

[jira] [Resolved] (HDDS-9826) Fix exception handling if one Datanode is not available (Ratis)

     [ https://issues.apache.org/jira/browse/HDDS-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Doroszlai resolved HDDS-9826.
------------------------------------
    Fix Version/s: 1.4.0
       Resolution: Fixed

> Fix exception handling if one Datanode is not available (Ratis)
> ---------------------------------------------------------------
>
>                 Key: HDDS-9826
>                 URL: https://issues.apache.org/jira/browse/HDDS-9826
>             Project: Apache Ozone
>          Issue Type: Task
>          Components: SCM Client
>    Affects Versions: 1.3.0
>            Reporter: Ivan Brusentsev
>            Assignee: Ivan Brusentsev
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> When a key is uploading by XcieverClientRatis, and some datanode becomes unavailable, it is expected that client should request new pipeline to retry upload.
> In fact, before that client tries to repeat commit check with _MAJORITY_COMMITTED_ replication level, which cannot be successful as at that moment pipeline is already closed.
> XceiverClientRatis has method watchForCommit(long index), which contains exception check
>  
> {code:java}
> if (t instanceof GroupMismatchException) {
>   throw e;
> }
> {code}
> GroupMismatchException throws by Ratis client exactly when some datanode is not available and further key upload is not available for current pipeline.
> But this check does not work as 
> {code:java}
> Throwable t = HddsClientUtils.checkForException(e);{code}
>  does not unwrap exception completely.
> The idea is fix lookup of nested exceptions to find proper one. This improves failover latency by 15 seconds approximately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org