You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Stephen O'Donnell (JIRA)" <ji...@apache.org> on 2018/07/11 16:23:00 UTC

[jira] [Created] (HDFS-13727) Log full stack trace if DiskBalancer exits with an unhandle exceptiopn

Stephen O'Donnell created HDFS-13727:
----------------------------------------

             Summary: Log full stack trace if DiskBalancer exits with an unhandle exceptiopn
                 Key: HDFS-13727
                 URL: https://issues.apache.org/jira/browse/HDFS-13727
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: diskbalancer
    Affects Versions: 3.0.3
            Reporter: Stephen O'Donnell


In HDFS-13175 it was discovered that when a DN reports the usage on a volume to be greater than the volume capacity, the disk balancer will fail with an unhelpful error:

{code}
$ hdfs diskbalancer -report -top 5

18/06/11 10:19:43 INFO command.Command: Processing report command
18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys
18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: java.lang.IllegalArgumentException
{code}

In HDFS-13175, a change was made to include more details in the exception name,  so after the change the code is:

{code}
  public void setUsed(long dfsUsedSpace) {
    Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
        "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
        dfsUsedSpace, getCapacity());
    this.used = dfsUsedSpace;
  }
{code}

There may however be other scenarios that cause the balancer to exit with an unhandled exception, and it would be helpful if the tool logged out the full stack trace on error rather than just the exception name.

In DiskBalancerCLI.java, the relevant code is:

{code}
  public static void main(String[] argv) throws Exception {
    DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration());
    int res = 0;
    try {
      res = ToolRunner.run(shell, argv);
    } catch (Exception ex) {
      LOG.error(ex.toString());
      res = 1;
    }
    System.exit(res);
  }
{code}

We should change the error logged in the exception block to log out the full stack to give more information on all unhandled errors, eg:

{code}
LOG.error(ex.toString(), ex);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org