You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Alex Zheng <al...@gmail.com> on 2009/03/15 12:45:32 UTC

what is the relation between the classes at the very beginning?

I am a newer for Hadoop, and am reading the code of Hadoop for a week
now i am very puzzled by the relation of so many classes after i run :
bin/start-all.sh

I know there are JobTrackerInstrumentation, JobTracker, Namenode etc so what
is the order of their initialization?
and after bin/start-all.sh and before i run any job, what exits in the
system?

thanks for your reply!

Re: what is the relation between the classes at the very beginning?

Posted by Steve Loughran <st...@apache.org>.
Alex Zheng wrote:
> I am a newer for Hadoop, and am reading the code of Hadoop for a week
> now i am very puzzled by the relation of so many classes after i run :
> bin/start-all.sh
> 
> I know there are JobTrackerInstrumentation, JobTracker, Namenode etc so what
> is the order of their initialization?
> and after bin/start-all.sh and before i run any job, what exits in the
> system?
> 

run jps -v to see what's up and about, netstat -p to list ports in use 
by the different processes.

The nodes are all designed to spin a bit waiting for their dependencies 
to come up; you don't need to bring them up in a strict order (which 
would be namenode-datanode(s)-jobtracker-tasktracker(s)) for a full MR 
cluster.

I have tests that poll for the various ports to be open before 
submitting work, and they sometimes get unhappy if you try submitting 
jobs straight after the job tracker appears live. If you are going to 
spin waiting for a job tracker to be visible, I would sleep a few 
seconds after it's IPC port opens up before sending in work. This is 
clearly some race condition, but not anything I've sat down to look at, 
as it's only a startup and a 10s sleep makes it go away