You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "LINTE (JIRA)" <ji...@apache.org> on 2015/08/14 16:46:45 UTC
[jira] [Created] (HDFS-8897) Loadbalancer
LINTE created HDFS-8897:
---------------------------
Summary: Loadbalancer
Key: HDFS-8897
URL: https://issues.apache.org/jira/browse/HDFS-8897
Project: Hadoop HDFS
Issue Type: Bug
Components: balancer & mover
Affects Versions: 2.7.1
Environment: Centos 6.6
Reporter: LINTE
When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS.
When the file doesn't exist, the balancer don't want to run :
15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox]
15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running.. Exiting ...
Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds
Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ...
2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=null perm=hdfs:hadoop:rw-r----- proto=rpc
2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=null perm=null proto=rpc
2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=null perm=null proto=rpc
The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java
The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error.
----
private OutputStream checkAndMarkRunning() throws IOException {
try {
if (fs.exists(idPath)) {
// try appending to it so that it will fail fast if another balancer is
// running.
IOUtils.closeStream(fs.append(idPath));
fs.delete(idPath, true);
}
final FSDataOutputStream fsout = fs.create(idPath, false);
// mark balancer idPath to be deleted during filesystem closure
fs.deleteOnExit(idPath);
if (write2IdFile) {
fsout.writeBytes(InetAddress.getLocalHost().getHostName());
fsout.hflush();
}
return fsout;
} catch(RemoteException e) {
if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){
return null;
} else {
throw e;
}
}
}
----
Regards
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)