You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "Matthew Sharp (JIRA)" <ji...@apache.org> on 2018/09/06 13:11:00 UTC
[jira] [Created] (KNOX-1436) AbstractHdfsHaDispatch failoverRequest
- Improve Failover Logging
Matthew Sharp created KNOX-1436:
-----------------------------------
Summary: AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging
Key: KNOX-1436
URL: https://issues.apache.org/jira/browse/KNOX-1436
Project: Apache Knox
Issue Type: Bug
Reporter: Matthew Sharp
The current WebHDFS failoverRequest method makes it a bit difficult to track which host it failed on vs. which it is retrying next.
Example:
{code:java}
2018-09-06 07:49:07,245 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:49:07,246 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:08,278 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:49:08,279 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:09,291 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
2018-09-06 07:49:09,291 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:10,366 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
2018-09-06 07:49:10,367 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
2018-09-06 07:49:10,368 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(136)) - Maximum attempts 3 to failover reached for service: WEBHDFS
{code}
In the example above, host1.example.com already failed initially and the message states failing over to a different host with host1.example.com still.
Suggestion:
The HaDispatchMessages for failingOverRequest should be moved down below the markFailedURL call, so it is actually returning the next URI it is trying to failover to (not the current it already failed on).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)