You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "Matthew Sharp (JIRA)" <ji...@apache.org> on 2018/09/06 13:11:00 UTC

[jira] [Created] (KNOX-1436) AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging

Matthew Sharp created KNOX-1436:
-----------------------------------

             Summary: AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging
                 Key: KNOX-1436
                 URL: https://issues.apache.org/jira/browse/KNOX-1436
             Project: Apache Knox
          Issue Type: Bug
            Reporter: Matthew Sharp


The current WebHDFS failoverRequest method makes it a bit difficult to track which host it failed on vs. which it is retrying next. 

Example:
{code:java}
2018-09-06 07:49:07,245 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:49:07,246 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:08,278 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:49:08,279 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:09,291 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:49:09,291 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:10,366 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:49:10,367 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:10,368 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(136)) - Maximum attempts 3 to failover reached for service: WEBHDFS
{code}
In the example above, host1.example.com already failed initially and the message states failing over to a different host with host1.example.com still.

Suggestion:

The HaDispatchMessages for failingOverRequest should be moved down below the markFailedURL call, so it is actually returning the next URI it is trying to failover to (not the current it already failed on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)