You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Kevin Tse <ke...@gmail.com> on 2010/06/06 04:00:32 UTC

Several questions about Hadoop

Hello,
I have successfully run Hadoop on a cluster of 3 nodes on RedHat linux, I
have several questions to ask.

1. When I submit an MR job using "hadoop jar mr-job.jar", it starts printing
out log messages to the stdout, how do I make it run in the
background?("hadoop jar mr-job.jar > log &" does not work). If it can be put
in the background, where do I find those log messages that it used to print
to the stdout?
2. While the MR job is being executed, will the MR job process be
affected/killed if I press "Ctrl-c"? it seems not since I can see the
tasktracker is still running, but I am not sure.
3. While the MR job is being executed, if I stop one of the
tasktrackers/nodes in the cluster using hadoop-daemon.sh, will the result of
the maps and reduces semi-completed by that tasktracker/node be submitted to
the namenode to be merged with the results completed by other tasktrackers?
Is it possible to restart a tasktracker at a point where it was stopped?

Thank you in advance.
- Kevin Tse

Re: Several questions about Hadoop

Posted by Gang Luo <lg...@yahoo.com.cn>.

1. try to use JobClient.submitJob(JobConf). It will submit the job to hadoop without waiting for its completion. 

2. no

3. the task running on the nodes which fail in the fly will be re-scheduled on other nodes. The incomplete result will not be used.

Thanks,
-Gang



----- 原始邮件 ----
发件人： Kevin Tse <ke...@gmail.com>
收件人： common-user@hadoop.apache.org
发送日期： 2010/6/5 (周六) 10:00:32 下午
主   题： Several questions about Hadoop

Hello,
I have successfully run Hadoop on a cluster of 3 nodes on RedHat linux, I
have several questions to ask.

1. When I submit an MR job using "hadoop jar mr-job.jar", it starts printing
out log messages to the stdout, how do I make it run in the
background?("hadoop jar mr-job.jar > log &" does not work). If it can be put
in the background, where do I find those log messages that it used to print
to the stdout?
2. While the MR job is being executed, will the MR job process be
affected/killed if I press "Ctrl-c"? it seems not since I can see the
tasktracker is still running, but I am not sure.
3. While the MR job is being executed, if I stop one of the
tasktrackers/nodes in the cluster using hadoop-daemon.sh, will the result of
the maps and reduces semi-completed by that tasktracker/node be submitted to
the namenode to be merged with the results completed by other tasktrackers?
Is it possible to restart a tasktracker at a point where it was stopped?

Thank you in advance.
- Kevin Tse

Re: Several questions about Hadoop

Posted by Ted Yu <yu...@gmail.com>.

1. If you use org.apache.log4j.Logger, you would be able to observe log
messages through
http://jobtracker:50060/tasklog?taskid=(attempt_201006052003_0003_m_000001_0&start=-8193)
    private final static Logger LOG =
Logger.getLogger(AbstractRepMapper.class);

2. After you submitted job to hadoop cluster, you need to use "hadoop job
-kill <job_id>"
See
http://support.cc.gatech.edu/facilities/instructional-labs/how-to-submit-and-manage-hadoop-jobs

On Sat, Jun 5, 2010 at 7:00 PM, Kevin Tse <ke...@gmail.com> wrote:

> Hello,
> I have successfully run Hadoop on a cluster of 3 nodes on RedHat linux, I
> have several questions to ask.
>
> 1. When I submit an MR job using "hadoop jar mr-job.jar", it starts
> printing
> out log messages to the stdout, how do I make it run in the
> background?("hadoop jar mr-job.jar > log &" does not work). If it can be
> put
> in the background, where do I find those log messages that it used to print
> to the stdout?
> 2. While the MR job is being executed, will the MR job process be
> affected/killed if I press "Ctrl-c"? it seems not since I can see the
> tasktracker is still running, but I am not sure.
> 3. While the MR job is being executed, if I stop one of the
> tasktrackers/nodes in the cluster using hadoop-daemon.sh, will the result
> of
> the maps and reduces semi-completed by that tasktracker/node be submitted
> to
> the namenode to be merged with the results completed by other tasktrackers?
> Is it possible to restart a tasktracker at a point where it was stopped?
>
> Thank you in advance.
> - Kevin Tse
>