You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "Larry McCay (JIRA)" <ji...@apache.org> on 2018/09/26 20:56:00 UTC

[jira] [Assigned] (KNOX-1436) AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging

     [ https://issues.apache.org/jira/browse/KNOX-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Larry McCay reassigned KNOX-1436:
---------------------------------

    Assignee: Matthew Sharp

> AbstractHdfsHaDispatch failoverRequest - Improve Failover Logging
> -----------------------------------------------------------------
>
>                 Key: KNOX-1436
>                 URL: https://issues.apache.org/jira/browse/KNOX-1436
>             Project: Apache Knox
>          Issue Type: Bug
>            Reporter: Matthew Sharp
>            Assignee: Matthew Sharp
>            Priority: Minor
>             Fix For: 1.2.0
>
>         Attachments: KNOX-1436.patch
>
>
> The current WebHDFS failoverRequest method makes it a bit difficult to track which host it failed on vs. which it is retrying next. 
> Example:
> {code:java}
> 2018-09-06 07:49:07,245 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:07,246 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:08,278 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:08,279 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:09,291 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(85)) - Received an error from a node in SafeMode: org.apache.knox.gateway.hdfs.dispatch.SafeModeException
> 2018-09-06 07:49:09,291 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host1.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,366 INFO knox.gateway (AbstractHdfsHaDispatch.java:executeRequest(82)) - Received an error from a node in Standby: org.apache.knox.gateway.hdfs.dispatch.StandbyException
> 2018-09-06 07:49:10,367 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(115)) - Failing over request to a different server: http://host2.example.com:50070/webhdfs/v1/user/matt/test.txt?op=CREATE&doAs=matt
> 2018-09-06 07:49:10,368 INFO knox.gateway (AbstractHdfsHaDispatch.java:failoverRequest(136)) - Maximum attempts 3 to failover reached for service: WEBHDFS
> {code}
> In the example above, host1.example.com already failed initially and the message states failing over to a different host with host1.example.com still.
> Suggestion:
> The HaDispatchMessages for failingOverRequest should be moved down below the markFailedURL call, so it is actually returning the next URI it is trying to failover to (not the current it already failed on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)