You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Boyu Zhang <bo...@gmail.com> on 2010/04/08 20:09:53 UTC

HOD: JobTracker failed to initialise

Dear All,

I am trying to install HOD on a cluster. When I tried to allocate a new
Hadoop cluster, I got the following error:

[2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
allocated because of the following errors.
Hodring at n0 failed with following errors:
JobTracker failed to initialise

*The log file ringmaster.log has the following message:*

[2010-04-08 13:46:22,297] DEBUG/10 ringMaster:479 - getServiceAddr name:
hdfs
[2010-04-08 13:46:22,299] DEBUG/10 ringMaster:487 - getServiceAddr service:
<hodlib.GridServices.hdfs.Hdfs instance at 0x2057b758>
[2010-04-08 13:46:22,300] DEBUG/10 ringMaster:504 - getServiceAddr addr
hdfs: not found

*The log file hodring.log has the following message:*

[2010-04-08 13:46:31,749] DEBUG/10 hodRing:416 - hadoopThread still == None
...
[2010-04-08 13:46:31,750] DEBUG/10 hodRing:419 - hadoop input: None
[2010-04-08 13:46:31,752] DEBUG/10 hodRing:428 - isForground: False
[2010-04-08 13:46:31,753] DEBUG/10 hodRing:440 - hadoop run status: True
[2010-04-08 13:46:31,754] DEBUG/10 hodRing:657 - Waiting for jobtracker to
initialise
[2010-04-08 13:46:31,755] DEBUG/10 hodRing:659 - jobtracker version : 20
[2010-04-08 13:46:31,756] DEBUG/10 hodRing:664 - jobtracker rpc server :
n2:59664
[2010-04-08 13:46:31,757] DEBUG/10 hodRing:670 - Jobtracker jetty : n2:57775
[2010-04-08 13:46:32,042] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 0.5
[2010-04-08 13:46:33,544] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 1.0
[2010-04-08 13:46:35,545] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 2.0
[2010-04-08 13:46:38,546] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 4.0
[2010-04-08 13:46:43,547] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 8.0
[2010-04-08 13:46:52,548] DEBUG/10 hodRing:713 - Jetty gave a socket error.
Sleeping for 16.0
4864033937778270/hdfs-nn/dfs-name']
[2010-04-08 13:47:08,552] CRITICAL/50 hodRing:723 - Jobtracker failed to
initialise.

*The log file hadoop.log in the actual compute node n0 has: *

2010-04-08 17:47:24,424 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/scratch/hod/mapredsys/zhang/mapredsystem/
85.geronimo.gcl.cis.udel.edu/jobtracker.info could only be replicated to 0
nodes, instead of 1
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

--------------------------------------------------------------------------------------------------
It looks like that hdfs daemon failed to start, so JT has no one to
communicate with, then Jetty gave a error.

I used hadoop0.20.2, Scyld OS, the cluster uses 0-5 (n0-n5) to refer to the
back end compute node. Did anyone have this problem before? Any help will be
appreciated.
P.S. I have tmp files Jetty*** generated under /tmp on the compute nodes,
but I set all the tmp dir to /home or /scratch, any idea?


Here is my hod conf file:

[hod]
stream                          = True
java-home                         =/usr
cluster                         = geronimo
cluster-factor                  = 1.8
xrs-port-range                  = 32768-65536
debug                           = 4
allocate-wait-time              = 3600
temp-dir                        = /home/zhang/hodtmp.$PBS_JOBID

[ringmaster]
register                        = True
stream                          = False
temp-dir                        = /scratch/hod/ringmastertmp.$PBS_JOBID
http-port-range                 = 8000-9000
work-dirs                       = /scratch/hod/tmp/1,/scratch/hod/tmp/2
xrs-port-range                  = 32768-65536
debug                           = 4

[hodring]
stream                          = False
temp-dir                        = /scratch/hod/hodringtmp.$PBS_JOBID
register                        = True
java-home                       = /usr
http-port-range                 = 8000-9000
xrs-port-range                  = 32768-65536
debug                           = 4
mapred-system-dir-root          = /scratch/hod/mapredsys

[resource_manager]
queue                           = batch
batch-home                      = /usr
id                              = torque
env-vars                        =
HOD_PYTHON_HOME=/opt/python/2.5.1/bin/python

[gridservice-mapred]
external                        = False
pkgs                            = /home/zhang/hadoop-0.20.2
tracker_port                    = 8030
info_port                       = 50080

[gridservice-hdfs]
external                        = False
pkgs                            = /home/zhang/hadoop-0.20.2
fs_port                         = 8020
info_port                       = 50070




Thanks a lot!!

Boyu