You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/10/29 16:11:58 UTC

[GitHub] [ozone] bharatviswa504 edited a comment on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

bharatviswa504 edited a comment on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718857999


   > I think there are 2 problems and I'm confused about which one is being fixed here:
   > 
   > Failover exception is shown in client unnecessarily (after trying 2 OMs, before trying last one).
   > Failover exception message shows list of all OMs (submitRequest over nodeId=...) instead of the single OM it contacted.
   > The PR description and issue title suggest that it is about problem 1, but the patch seems to fix problem 2 instead.
   > 
   > The failover exception is shown if client contacts OMs in "follower1, follower2, leader" order. Which it does occasionally because client ignores "suggested leader" info and tries all OMs in order (as reported in HDDS-3936).
   
   Here the issue is each proxyInfo object should have information about their specific nodeID and IPAddress. During fixing of the Bug HDDS-4292 it has updated thee proxyInfo for each object with all proxyInfo(map of proxy info to String).
   
   The fix for skipping log is done in HADOOP-17116 which went in 3.3.1 release and Apache Ozone depends on 3.2.1, to fix logging issue in Ozone we need a new release of Hadoop with HADOOP-17116. That is the reason we are still observing this in Apache Ozone.
   
   Internally at Cloudera, we have backported HADOOP-17116, and HDDS-4292 broke this.
   
   Having proxy info with each OM information, instead of all OM information in proxy info will fix the logging once we have the fix HADOOP-17116.
   
   
   
   
   
   ```
   boolean info = true;
     |   | 397 | // If this is the first failover to this proxy, skip logging at INFO level
     |   | 398 | if (!failedAtLeastOnce.contains(proxyDescriptor.getProxyInfo().toString()))
     |   | 399 | {
     |   | 400 | failedAtLeastOnce.add(proxyDescriptor.getProxyInfo().toString());
     |   | 401 |  
     |   | 402 | // If successful calls were made to this proxy, log info even for first
     |   | 403 | // failover
     |   | 404 | info = hasSuccessfulCall \|\| asyncCallHandler.hasSuccessfulCall();
     |   | 405 | if (!info && !LOG.isDebugEnabled()) {
     |   | 406 | return;
     |   | 407 | }
   399 | } | 408
   ```
   
   So having each proxy info, and if the call fails, it will not be there in Map, we add to map and if there is no successfulCall wereturn with out logging.
   
   
   This issue only fixes logging when retrying OM's first time till it finds leader OM. (It is not fixing HDDS-3936)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org