You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2021/05/18 09:59:00 UTC
[jira] [Comment Edited] (HDDS-5228) Make OM FailOverProxyProvider work across threads

    [ https://issues.apache.org/jira/browse/HDDS-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346759#comment-17346759 ] 

Bharat Viswanadham edited comment on HDDS-5228 at 5/18/21, 9:58 AM:
--------------------------------------------------------------------

{quote}Even if RpcClient is shared across threads, they all will have the same FailoverProxyProvider. If the 1st thread fails over and discovers the leader OM, all the subsequent requests from any thread) will be directed to the correct OM. I do not see how the retry count will be exhausted because of shared threads. Please let me know if I am missing something here.{quote}

The problem here is we update the currentProxyNodeId in RetryPolicy#shouldRetry.

So, lets say 2 threads both contacting OM1 and if OM1 is down.

T1 updates the proxy to OM2 and updates the proxy in proxyDescriptor.
T2 updates the proxy to OM3 and updates the proxy in proxyDescriptor.

So here if T1 and T2 are running in parallel, once after T1 updates, T2 should not update it.

RetryInvocationhandler this case by comparing expected failOverCount and not calling performFailOver, but our performFailOver is a no-op and currentProxyNodeId is update in shouldRetry.


Recently we have fixed this for SCM, for more info refer to this.
https://github.com/apache/ozone/pull/2249#issue-645725169


was (Author: bharatviswa):
Even if RpcClient is shared across threads, they all will have the same FailoverProxyProvider. If the 1st thread fails over and discovers the leader OM, all the subsequent requests from any thread) will be directed to the correct OM. I do not see how the retry count will be exhausted because of shared threads. Please let me know if I am missing something here.

The problem here is we update the currentProxyNodeId in RetryPolicy#shouldRetry.

So, lets say 2 threads both contacting OM1 and if OM1 is down.

T1 updates the proxy to OM2 and updates the proxy in proxyDescriptor.
T2 updates the proxy to OM3 and updates the proxy in proxyDescriptor.

So here if T1 and T2 are running in parallel, once after T1 updates, T2 should not update it.

RetryInvocationhandler this case by comparing expected failOverCount and not calling performFailOver, but our performFailOver is a no-op and currentProxyNodeId is update in shouldRetry.


Recently we have fixed this for SCM, for more info refer to this.
https://github.com/apache/ozone/pull/2249#issue-645725169

> Make OM FailOverProxyProvider work across threads
> -------------------------------------------------
>
>                 Key: HDDS-5228
>                 URL: https://issues.apache.org/jira/browse/HDDS-5228
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Bharat Viswanadham
>            Assignee: Bharat Viswanadham
>            Priority: Major
>
> Use perform failover for doing perform failover instead of updating proxy in RetryPolocy#shouldRetry.
> With this, if RpcClient shared across threads it will unnecessarily exhaust the retry count. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org