You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/10/29 01:28:50 UTC

[GitHub] [ozone] bharatviswa504 opened a new pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

bharatviswa504 opened a new pull request #1531:
URL: https://github.com/apache/ozone/pull/1531


   ## What changes were proposed in this pull request?
   
   Skip Retry INFO logging on first failover from a proxy is broken. This fixes the behavior.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4405
   
   ## How was this patch tested?
   
   Tested it on the deployed ozone cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718408846


   Thanks @bharatviswa504  for fixing it.
   +1 LGTM. Could you please check the failures checks above once? Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718966688


   Thanks Bharat. +1.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718857999


   > I think there are 2 problems and I'm confused about which one is being fixed here:
   > 
   > Failover exception is shown in client unnecessarily (after trying 2 OMs, before trying last one).
   > Failover exception message shows list of all OMs (submitRequest over nodeId=...) instead of the single OM it contacted.
   > The PR description and issue title suggest that it is about problem 1, but the patch seems to fix problem 2 instead.
   > 
   > The failover exception is shown if client contacts OMs in "follower1, follower2, leader" order. Which it does occasionally because client ignores "suggested leader" info and tries all OMs in order (as reported in HDDS-3936).
   
   Here the issue is each proxyInfo object should have information about their specific nodeID and IPAddress. During fixing of the Bug HDDS-4292 it has updated thee proxyInfo for each object with all proxyInfo(map of proxy info to String).
   
   The fix for skipping log is done in HADOOP-17116 which went in 3.4.0(But still there is no release yet in Apache) and Apache Ozone depends on 3.2.1, to fix logging issue in Ozone we need a new release of Hadoop with HADOOP-17116. That is the reason we are still observing this in Apache Ozone.
   
   Internally at Cloudera, we have backported HADOOP-17116, and HDDS-4292 broke this.
   
   Having proxy info with each OM information, instead of all OM information in proxy info will fix the logging once we have the fix HADOOP-17116.
   
   
   
   
   
   ```
   boolean info = true;
     |   | 397 | // If this is the first failover to this proxy, skip logging at INFO level
     |   | 398 | if (!failedAtLeastOnce.contains(proxyDescriptor.getProxyInfo().toString()))
     |   | 399 | {
     |   | 400 | failedAtLeastOnce.add(proxyDescriptor.getProxyInfo().toString());
     |   | 401 |  
     |   | 402 | // If successful calls were made to this proxy, log info even for first
     |   | 403 | // failover
     |   | 404 | info = hasSuccessfulCall \|\| asyncCallHandler.hasSuccessfulCall();
     |   | 405 | if (!info && !LOG.isDebugEnabled()) {
     |   | 406 | return;
     |   | 407 | }
   399 | } | 408
   ```
   
   So having each proxy info having info about it's own node information, and if the call fails, it will not be there in Map, we add to map, and if there is no successfulCall we return with out logging. But with the current code, each proxy has all OM info, so it is logging after the first failover. (As the proxy info is same for all OM'S)
   
   
   This issue only fixes logging when retrying OM's first time till it finds leader OM. (It is not fixing HDDS-3936)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718857999


   > I think there are 2 problems and I'm confused about which one is being fixed here:
   > 
   > Failover exception is shown in client unnecessarily (after trying 2 OMs, before trying last one).
   > Failover exception message shows list of all OMs (submitRequest over nodeId=...) instead of the single OM it contacted.
   > The PR description and issue title suggest that it is about problem 1, but the patch seems to fix problem 2 instead.
   > 
   > The failover exception is shown if client contacts OMs in "follower1, follower2, leader" order. Which it does occasionally because client ignores "suggested leader" info and tries all OMs in order (as reported in HDDS-3936).
   
   Here the issue is each proxyInfo object should have information about their specific nodeID and IPAddress. During fixing of the Bug HDDS-4292 it has updated thee proxyInfo for each object with all proxyInfo(map of proxy info to String).
   
   The fix for skipping log is done in HADOOP-17116 which went in 3.3.1 release and Apache Ozone depends on 3.2.1, to fix logging issue in Ozone we need a new release of Hadoop with HADOOP-17116. That is the reason we are still observing this in Apache Ozone.
   
   Internally at Cloudera, we have backported HADOOP-17116, and HDDS-4292 broke this.
   
   Having proxy info with each OM information, instead of all OM information in proxy info will fix the logging once we have the fix HADOOP-17116.
   
   
   
   
   
   ```
   boolean info = true;
     |   | 397 | // If this is the first failover to this proxy, skip logging at INFO level
     |   | 398 | if (!failedAtLeastOnce.contains(proxyDescriptor.getProxyInfo().toString()))
     |   | 399 | {
     |   | 400 | failedAtLeastOnce.add(proxyDescriptor.getProxyInfo().toString());
     |   | 401 |  
     |   | 402 | // If successful calls were made to this proxy, log info even for first
     |   | 403 | // failover
     |   | 404 | info = hasSuccessfulCall \|\| asyncCallHandler.hasSuccessfulCall();
     |   | 405 | if (!info && !LOG.isDebugEnabled()) {
     |   | 406 | return;
     |   | 407 | }
   399 | } | 408
   ```
   
   So having each proxy info, and if the call fails, it will not be there in Map, we add to map and if there is no successfulCall wereturn with out logging.
   
   
   This issue only fixes logging when retrying OM's first time till it finds leader OM. (It is not fixing HDDS-3936)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718884115


   > The fix for skipping log is done in HADOOP-17116
   
   Thanks @bharatviswa504 for the detailed explanation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718857999


   > I think there are 2 problems and I'm confused about which one is being fixed here:
   > 
   > Failover exception is shown in client unnecessarily (after trying 2 OMs, before trying last one).
   > Failover exception message shows list of all OMs (submitRequest over nodeId=...) instead of the single OM it contacted.
   > The PR description and issue title suggest that it is about problem 1, but the patch seems to fix problem 2 instead.
   > 
   > The failover exception is shown if client contacts OMs in "follower1, follower2, leader" order. Which it does occasionally because client ignores "suggested leader" info and tries all OMs in order (as reported in HDDS-3936).
   
   Here the issue is each proxyInfo object should have information about their specific nodeID and IPAddress. During fixing of the Bug HDDS-4292 it has updated thee proxyInfo for each object with all proxyInfo(map of proxy info to String).
   
   The fix for skipping log is done in HADOOP-17116 which went in 3.3.1 release and Apache Ozone depends on 3.2.1, to fix logging issue in Ozone we need a new release of Hadoop with HADOOP-17116. That is the reason we are still observing this in Apache Ozone.
   
   Internally at Cloudera, we have backported HADOOP-17116, and HDDS-4292 broke this.
   
   Having proxy info with each OM information, instead of all OM information in proxy info will fix the logging once we have the fix HADOOP-17116.
   
   
   
   
   
   ```
   boolean info = true;
     |   | 397 | // If this is the first failover to this proxy, skip logging at INFO level
     |   | 398 | if (!failedAtLeastOnce.contains(proxyDescriptor.getProxyInfo().toString()))
     |   | 399 | {
     |   | 400 | failedAtLeastOnce.add(proxyDescriptor.getProxyInfo().toString());
     |   | 401 |  
     |   | 402 | // If successful calls were made to this proxy, log info even for first
     |   | 403 | // failover
     |   | 404 | info = hasSuccessfulCall \|\| asyncCallHandler.hasSuccessfulCall();
     |   | 405 | if (!info && !LOG.isDebugEnabled()) {
     |   | 406 | return;
     |   | 407 | }
   399 | } | 408
   ```
   
   So having each proxy info, and if the call fails, it will not be there in Map, we add to map and if there is no successfulCall wereturn with out logging. But with current code each proxy has all OM info, so it is logging after first fail over.
   
   
   This issue only fixes logging when retrying OM's first time till it finds leader OM. (It is not fixing HDDS-3936)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718857999


   > I think there are 2 problems and I'm confused about which one is being fixed here:
   > 
   > Failover exception is shown in client unnecessarily (after trying 2 OMs, before trying last one).
   > Failover exception message shows list of all OMs (submitRequest over nodeId=...) instead of the single OM it contacted.
   > The PR description and issue title suggest that it is about problem 1, but the patch seems to fix problem 2 instead.
   > 
   > The failover exception is shown if client contacts OMs in "follower1, follower2, leader" order. Which it does occasionally because client ignores "suggested leader" info and tries all OMs in order (as reported in HDDS-3936).
   
   Here the issue is each proxyInfo object should have information about their specific nodeID and IPAddress. During fixing of the Bug HDDS-4292 it has updated thee proxyInfo for each object with all proxyInfo(map of proxy info to String).
   
   The fix for skipping log is done in HADOOP-17116 which went in 3.3.1 release and Apache Ozone depends on 3.2.1, to fix logging issue in Ozone we need a new release of Hadoop with HADOOP-17116. That is the reason we are still observing this in Apache Ozone.
   
   Internally at Cloudera, we have backported HADOOP-17116, and HDDS-4292 broke this.
   
   Having proxy info with each OM information, instead of all OM information in proxy info will fix the logging once we have the fix HADOOP-17116.
   
   
   
   
   
   ```
   boolean info = true;
     |   | 397 | // If this is the first failover to this proxy, skip logging at INFO level
     |   | 398 | if (!failedAtLeastOnce.contains(proxyDescriptor.getProxyInfo().toString()))
     |   | 399 | {
     |   | 400 | failedAtLeastOnce.add(proxyDescriptor.getProxyInfo().toString());
     |   | 401 |  
     |   | 402 | // If successful calls were made to this proxy, log info even for first
     |   | 403 | // failover
     |   | 404 | info = hasSuccessfulCall \|\| asyncCallHandler.hasSuccessfulCall();
     |   | 405 | if (!info && !LOG.isDebugEnabled()) {
     |   | 406 | return;
     |   | 407 | }
   399 | } | 408
   ```
   
   So having each proxy info having info about it's own node information, and if the call fails, it will not be there in Map, we add to map, and if there is no successfulCall we return with out logging. But with the current code, each proxy has all OM info, so it is logging after the first failover. (As the proxy info is same for all OM'S)
   
   
   This issue only fixes logging when retrying OM's first time till it finds leader OM. (It is not fixing HDDS-3936)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #1531:
URL: https://github.com/apache/ozone/pull/1531


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718848291


   @bharatviswa504, can you elaborate on the issue and the fix in the description please. It is not clear why the issue occurred and what the fix is. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718967900


   Thank You @umamaheswararao @adoroszlai and @hanishakoneru for the review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718809926


   Thank You @umamaheswararao for the review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #1531: HDDS-4405. Proxy failover is logging with out trying all OMS.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1531:
URL: https://github.com/apache/ozone/pull/1531#issuecomment-718857999


   > I think there are 2 problems and I'm confused about which one is being fixed here:
   > 
   > Failover exception is shown in client unnecessarily (after trying 2 OMs, before trying last one).
   > Failover exception message shows list of all OMs (submitRequest over nodeId=...) instead of the single OM it contacted.
   > The PR description and issue title suggest that it is about problem 1, but the patch seems to fix problem 2 instead.
   > 
   > The failover exception is shown if client contacts OMs in "follower1, follower2, leader" order. Which it does occasionally because client ignores "suggested leader" info and tries all OMs in order (as reported in HDDS-3936).
   
   Here the issue is each proxyInfo object should have information about their specific nodeID and IPAddress. During fixing of the Bug HDDS-4292 it has updated thee proxyInfo for each object with all proxyInfo(map of proxy info to String).
   
   The fix for skipping log is done in HADOOP-17116 which went in 3.3.1 release and Apache Ozone depends on 3.2.1, to fix logging issue in Ozone we need a new release of Hadoop with HADOOP-17116. That is the reason we are still observing this in Apache Ozone.
   
   Internally at Cloudera, we have backported HADOOP-17116, and HDDS-4292 broke this.
   
   Having proxy info with each OM information, instead of all OM information in proxy info will fix the logging once we have the fix HADOOP-17116.
   
   
   
   
   
   ```
   boolean info = true;
     |   | 397 | // If this is the first failover to this proxy, skip logging at INFO level
     |   | 398 | if (!failedAtLeastOnce.contains(proxyDescriptor.getProxyInfo().toString()))
     |   | 399 | {
     |   | 400 | failedAtLeastOnce.add(proxyDescriptor.getProxyInfo().toString());
     |   | 401 |  
     |   | 402 | // If successful calls were made to this proxy, log info even for first
     |   | 403 | // failover
     |   | 404 | info = hasSuccessfulCall \|\| asyncCallHandler.hasSuccessfulCall();
     |   | 405 | if (!info && !LOG.isDebugEnabled()) {
     |   | 406 | return;
     |   | 407 | }
   399 | } | 408
   ```
   
   So having each proxy info, and if the call fails, it will not be there in Map, we add to map and if there is no successfulCall wereturn with out logging.
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org