You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Boyu Zhang <> on 2010/04/08 20:09:53 UTC

HOD: JobTracker failed to initialise

Dear All,

I am trying to install HOD on a cluster. When I tried to allocate a new
Hadoop cluster, I got the following error:

[2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
allocated because of the following errors.
Hodring at n0 failed with following errors:
JobTracker failed to initialise

*The log file ringmaster.log has the following message:*

[2010-04-08 13:46:22,297] DEBUG/10 ringMaster:479 - getServiceAddr name:
[2010-04-08 13:46:22,299] DEBUG/10 ringMaster:487 - getServiceAddr service:
<hodlib.GridServices.hdfs.Hdfs instance at 0x2057b758>
[2010-04-08 13:46:22,300] DEBUG/10 ringMaster:504 - getServiceAddr addr
hdfs: not found

*The log file hodring.log has the following message:*

[2010-04-08 13:46:31,749] DEBUG/10 hodRing:416 - hadoopThread still == None
[2010-04-08 13:46:31,750] DEBUG/10 hodRing:419 - hadoop input: None
[2010-04-08 13:46:31,752] DEBUG/10 hodRing:428 - isForground: False
[2010-04-08 13:46:31,753] DEBUG/10 hodRing:440 - hadoop run status: True
[2010-04-08 13:46:31,754] DEBUG/10 hodRing:657 - Waiting for jobtracker to
[2010-04-08 13:46:31,755] DEBUG/10 hodRing:659 - jobtracker version : 20
[2010-04-08 13:46:31,756] DEBUG/10 hodRing:664 - jobtracker rpc server :
[2010-04-08 13:46:31,757] DEBUG/10 hodRing:670 - Jobtracker jetty : n2:57775
[2010-04-08 13:46:32,042] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 0.5
[2010-04-08 13:46:33,544] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 1.0
[2010-04-08 13:46:35,545] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 2.0
[2010-04-08 13:46:38,546] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 4.0
[2010-04-08 13:46:43,547] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 8.0
[2010-04-08 13:46:52,548] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 16.0
[2010-04-08 13:47:08,552] CRITICAL/50 hodRing:723 - Jobtracker failed to

*The log file hadoop.log in the actual compute node n0 has: *

2010-04-08 17:47:24,424 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: org.apache.hadoop.ipc.RemoteException: File
/scratch/hod/mapredsys/zhang/mapredsystem/ could only be replicated to 0
nodes, instead of 1
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

It looks like that hdfs daemon failed to start, so JT has no one to
communicate with, then Jetty gave a error.

I used hadoop0.20.2, Scyld OS, the cluster uses 0-5 (n0-n5) to refer to the
back end compute node. Did anyone have this problem before? Any help will be
P.S. I have tmp files Jetty*** generated under /tmp on the compute nodes,
but I set all the tmp dir to /home or /scratch, any idea?

Here is my hod conf file:

stream                          = True
java-home                         =/usr
cluster                         = geronimo
cluster-factor                  = 1.8
xrs-port-range                  = 32768-65536
debug                           = 4
allocate-wait-time              = 3600
temp-dir                        = /home/zhang/hodtmp.$PBS_JOBID

register                        = True
stream                          = False
temp-dir                        = /scratch/hod/ringmastertmp.$PBS_JOBID
http-port-range                 = 8000-9000
work-dirs                       = /scratch/hod/tmp/1,/scratch/hod/tmp/2
xrs-port-range                  = 32768-65536
debug                           = 4

stream                          = False
temp-dir                        = /scratch/hod/hodringtmp.$PBS_JOBID
register                        = True
java-home                       = /usr
http-port-range                 = 8000-9000
xrs-port-range                  = 32768-65536
debug                           = 4
mapred-system-dir-root          = /scratch/hod/mapredsys

queue                           = batch
batch-home                      = /usr
id                              = torque
env-vars                        =

external                        = False
pkgs                            = /home/zhang/hadoop-0.20.2
tracker_port                    = 8030
info_port                       = 50080

external                        = False
pkgs                            = /home/zhang/hadoop-0.20.2
fs_port                         = 8020
info_port                       = 50070

Thanks a lot!!
