You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by LiuShuguang <ho...@hotmail.com> on 2007/10/12 10:48:59 UTC

Enhancement to Hadoop

Hello,

I made some enhancement to hadoop based on 0.14. But I do not know how to distribute to others. can you help me on that?

the following is the command and their manual.

regards,
Shuguang Liu


hbot(1)                      HADOOP                       hbot(1)

NAME
	hbot - Move the job to the last relatively in job queue. 
 
SYNOPSIS
	hbot [jobid] 

DESCRIPTION
	Move the specifed job relative to your last job in the queue. 

	If invoked by a regular user, btop move the selected job before the
	first job with the same priority submitted by the user. 

	If invoded by HADOOP Administrator, btop moves the selected job before
	the first job with the same priority submitted to the queue. 

EXAMPLE
	% hbot job_200709181000_0001
	Job  has been moved to position 1 from
bottom.
AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	htop(1), hjobs(1)

hbot (hadoop utils) 13.0		October 2007 
htop(1)                      HADOOP                       htop(1)

NAME
	htop - Move the job to the first in job queue. 
 
SYNOPSIS
	htop [jobid] 

DESCRIPTION
	Move the specifed job relative to your first job in the queue. 

	If invoked by a regular user, btop move the selected job before the
	first job with the same priority submitted by the user. 

	If invoded by HADOOP Administrator, btop moves the selected job before
	the first job with the same priority submitted to the queue. 

EXAMPLE
	% htop job_200709181000_0001
	Job  has been moved to position 1 from
top.
AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hbot(1), hjobs(1)
htop (hadoop utils) 13.0		October 2007 

hopen(1)                      HADOOP                       hopen(1)

NAME
	hopen - open a TaskTracker to accept new tasks. 
 
SYNOPSIS
	hopen [-h] [-jt jt1;jt2..] HOSTNAME 

DESCRIPTION
	Open the TaskTracker specified by HOSTNAME. After re-opened, it will
	then accept new MAP/REDUCE tasks. 

	-h
		Display usage information

	-jt jobtracker url link
		Specify the cluster by the jobtracker url link, for example. 
			-jt hostA:3479;hostB:3479
EXAMPLE
	% hhosts
	HOST    STATUS  MAX     MAP     REDUCE  FAILURE TMP_SPACE
	c0101   ok      8       0       0       0       203M
	c0102   closed  8       0       0       0       553M

	% hopen c0102
	Host  will be open, please confirm by hhosts/hload

	% hhosts                                                        
	HOST    STATUS  MAX     MAP     REDUCE  FAILURE TMP_SPACE
	c0101   ok      8       0       0       0       203M
	c0102   ok      8       0       0       0       553M
AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hclose(1), hslot(1), hhosts(1), hload(1)

hopen (hadoop utils) 13.0		September 2007 
hclose(1)                      HADOOP                       hclose(1)

NAME
	hclose - close a TaskTracker for some reason. 
 
SYNOPSIS
	hclose [-h] [-jt jt1;jt2..] HOSTNAME 

DESCRIPTION
	Close the TaskTracker specified by HOSTNAME. After closed, it will
	not accept new MAP/REDUCE tasks. But tasks currently running will
	continue until finished. It is useful to maintenance the host. 

	-h
		Display usage information

	-jt jobtracker url link
		Specify the cluster by the jobtracker url link, for example. 
			-jt hostA:3479;hostB:3479
EXAMPLE
	% hhosts
	HOST    STATUS  MAX     MAP     REDUCE  FAILURE TMP_SPACE
	c0101   ok      8       0       0       0       203M
	c0102   ok      8       0       0       0       553M

	% hclose c0102
	Host  will be closed, please confirm by hhosts/hload

	% hhosts                                                        
	HOST    STATUS  MAX     MAP     REDUCE  FAILURE TMP_SPACE
	c0101   ok      8       0       0       0       203M
	c0102   closed  8       0       0       0       553M
AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hopen(1), hslot(1), hhosts(1), hload(1)
hclose (hadoop utils) 13.0		September 2007 

hslot(1)                      HADOOP                       hslot(1)

NAME
	hslot - Set the maximum number of tasks that can be run at the same
			time on a specified host on the fly
 
SYNOPSIS
	hslot [-h] [-jt jt1;jt2..] HOSTNAME SLOTNUMBER

DESCRIPTION
	hslot set the TaskTracker's (specified by HOSTNAME) maximum availabe
	slot. With this commands, users can change the currently running task
	numbers dynamically. So the job can get a better performance.

	For example, if the max increased, the tasktracker will be able to get
	new tasks. But to increase the SLOTNUMBER to a big number is insane.  

	For tasktracker that is closed, it's behaved as before. 

	For TaskTracker that is open, There are two situations:
		A:  Increase the slot:
			TaskTracker will be able to accept new task, the number is
				MAX - CURRENT_RUNNING. 
			  
		B:  Decrease the slot:
			Tasks currently running will not be affacted, after a long run,
			there will at most MAX tasks running. 

	-h
		Display usage information

	-jt jobtracker url link
		Specify the cluster by the jobtracker url link, for example. 
			-jt hostA:3479;hostB:3479
EXAMPLE
	% hhosts
	HOST    STATUS  MAX     MAP     REDUCE  FAILURE TMP_SPACE
	c0101   ok      8       0       0       0       203M
	c0102   ok      8       0       0       0       553M

	% hslot c0102
	Host  Task Slots will be set to , please confirm by hhosts/hload

	% hhosts                                                        
	HOST    STATUS  MAX     MAP     REDUCE  FAILURE TMP_SPACE
	c0101   ok      8       0       0       0       203M
	c0102   ok      4       0       0       0       553M

AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hopen(1), hclose(1), hhosts(1), hload(1)
hslot (hadoop utils) 13.0		September 2007 
hjobs(1)                      HADOOP                       hjobs(1)

NAME
			hjobs - list hadoop jobs finished or running in hadoop
 
SYNOPSIS
	hjobs [OPTION]... [JOBID]...

DESCRIPTION
	hjobs will list hadoop jobs. If no parameters specifed, this command
	will list the jobs queuing and running only. 

	-u user_name
		With this option, hjobs will display jobs for a user_name 

	-l jobid
		Display the jobid information in long format. 

	-jt jobtracker url link.
		Display the job information of a cluster specified by the jobtracker
		url link, for example. -jt hostA:3479;hostB:3479

EXAMPLE
	List jobs

	% hjobs
	

	JOBID	USER   STAT     FROM_HOST   JOB_NAME   SUBMIT_TIME       	
	0023	user1  RUN      hostA       WordCount  Sep 10 08:33
	0024	user1  PEND     hostA       WordCount  Sep 10 08:47
	0025	user2  PEND     hostB       WordCount  Sep 10 09:12

	Run hjobs -u user_name to display jobs for a specific user. 

	% hjobs -u user1
	

	JOBID	USER   STAT     FROM_HOST   JOB_NAME   SUBMIT_TIME       	
	0023	user1  RUN      hostA       WordCount  Sep 10 08:33
	0024	user1  PEND     hostA       WordCount  Sep 10 08:47

	% hjobs -l job_200708312310_0005
	Job , User , Status 

	Thu Jan 01 08:00:00: Submitted, JobName 
						 Input Files 
						 Output Path 

	MAP                : Total Progress 
						 Total Number of MAP Tasks 
						 Total Number of Finished MAPs 
						 Total Number of Running MAPs 
						 Total Number of Failed MAPs 
						 

	Sun Sep 16 19:29:59: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 
	Sun Sep 16 19:30:36: Map Task  Finished.

	Sun Sep 16 19:30:01: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 
	Sun Sep 16 19:30:36: Map Task  Finished.

	Sun Sep 16 19:30:07: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 

	Sun Sep 16 19:30:36: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 
	Sun Sep 16 19:30:44: Map Task  Finished.

	Sun Sep 16 19:30:40: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 

	Sun Sep 16 19:30:44: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 
	Sun Sep 16 19:30:59: Map Task  Finished.

	Sun Sep 16 19:30:46: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 
	Sun Sep 16 19:31:02: Map Task  Finished.

	Sun Sep 16 19:31:02: Map Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times 

	REDUCE             : Total Progress 
						 Total Number of Reduce Tasks 
						 Total Number of Finished Reduces 
						 Total Number of Running Reduces 
						 Total Number of Failed Reduces 
						 

	Sun Sep 16 19:30:10: Reduce Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times  

	Sun Sep 16 19:30:11: Reduce Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times  

	Sun Sep 16 19:30:53: Reduce Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times  

	Sun Sep 16 19:30:54: Reduce Task , State  Launched on hosts: 
						 
						 Failure Times  Kill Times  
						 Reduce Task , State 
						 Reduce Task , State 
						 Reduce Task , State 
						 Reduce Task , State 

	URL                : 

AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hjobkill(1), htop(1), hbot(1)

hjobs (hadoop utils) 13.0		October 2007 
hjobkill(1)                      HADOOP                       hjobkill(1)

NAME
	hjobkill - Kill jobs given the job name and clustername. it maybe rerun
			   depending on parameter.
 
SYNOPSIS
	hjobkill [OPTION] jobid1 jobid2 ...

DESCRIPTION
	hjobkill will kill the jobs specified by the parameter. If 0 is the
	argument, all jobs will be terminated. 
 
	-u user_name
		With this option, hjobs will display jobs for a user_name 

	-jt jobtracker url link.
		specify the cluster cluster by the jobtracker
		url link, for example. -jt hostA:3479;hostB:3479

	-r 
		If this option is present, the job will be reschedule and ran.

EXAMPLE
	% hjobkill job_200708312310_0001
	Job  has been killed successfully.

	% hjobkill -r job_200708312310_0001
	Job  has been killed successfully.
	Job  has been rerun. 

	% hjobkill 0
	Job  has been killed successfully.
	Job  has been killed successfully.
	Job  has been killed successfully.	

AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hjobs(1)

hjobkill (hadoop utils) 13.0		October 2007 
htasks(1)                      HADOOP                       htasks(1)

NAME
	htasks -
 
SYNOPSIS
	htasks [OPTION]...

DESCRIPTION
	Add something here.

	-a, --all
		the option

	-b, --ba
		the option

AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO

htasks (hadoop utils) 13.0		October 2007 
hload(1)                      HADOOP                       hload(1)

NAME
	hload - Display load information about cluster hosts in 5 seconds
			interval.
 
SYNOPSIS
	hload [OPTION]...

DESCRIPTION
	By default, hload displays load information about all hosts in the
	specified cluster. This command will display the host status, cpu
	usage, tmp space, idle time and so on.

	-h
		Disoplay usage information. 

	-jt jobtracker url
		specify the cluster cluster by the jobtracker's url link, for
		example: -jt hostA:3479;hostB:3479 

OUTPUT
	By default, hload display the following fields:

	HOST
		The name of the host. If a host is currently a host of a cluster
		specifed by the command, the host name will be displayed here.

	status
		The current status of the host (In fact, the status of TaskTracker
		daemon). The possilbe values for host status are as the follows. 
		ok
			The host is available to accept new tasks.
		closed
			The host is not allowed to accept new tasks any more, but tasks
			currently running on the host will continue until finished.

	r1m
		The 1-minute exponentially averaged CPU run queue length
 
	r5m
		The 5-minute exponentially averaged CPU run queue length
 
	r15m
		The 15-minute exponentially averaged CPU run queue length
 
	ut%
		The CPU utilization exponentially averaged over the last 5 seconds.
		between 0.00 and 100.00

	pg
		The memory paging rate exponentially averaged over the last minute, in
		pages per second.

	it
		On UNIX, the idle time of the host (keyboard not touched on all logged
		in sessions), in minutes.

	tmp
		The amount of free space in /tmp, in megabytes. 

	swp
		The amount of available SWAP, in megabytes.

	mem
		The amount of available RAM, in megabytes.
EXAMPLE

	% hload
	HOST    status  r1m     r5m     r15m    ut%     pg      it      tmp		swp     mem
	c0101   ok      6.33    3.09    1.16    89.17   207.7   0       202M    509M	216M
	c0102   ok      4.74    1.54    0.54    7.65    6.86    0       553M    509M	74M

AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hhosts(1), hopen(1), hclose(1), hslot(1)

hload (hadoop utils) 13.0		September 2007 
hhosts(1)                      HADOOP                       hhosts(1)

NAME
	hhosts - display current status of the host(s). 
 
SYNOPSIS
	hhosts [-jt trackers:trackers...] [-h]

DESCRIPTION
	By default, hhosts returns the information about all hosts in the
	specified cluster. This command will display the host status, task
	slots and so on. 

	-h
		Disoplay usage information. 

	-jt jobtracker url
		specify the cluster cluster by the jobtracker's url link, for
		example: -jt hostA:3479;hostB:3479 

OUTPUT
	By default, hhosts display the following fields:

	HOST
		The name of the host. If a host is currently a host of a cluster
		specifed by the command, the host name will be displayed here.

	STATUS
		The current status of the host (In fact, the status of TaskTracker
		daemon). The possilbe values for host status are as the follows. 
		ok
			The host is available to accept new tasks.
		closed
			The host is not allowed to accept new tasks any more, but tasks
			currently running on the host will continue until finished.
 
	MAX
		The maximum number of tasks including MAPs and Reduces that can be
		run at the same time for a host. (2*cpu number is recommanded)

	MAP
		The number of MAP tasks currently running on the hosts.

	REDUCE 	
		The number of REDUCE tasks currently running on the hosts.

	FAILURE
		The number failed tasks on the host.

	TMP_SPACE
		The amount of free space in /tmp, in megabytes. For MAP/REDUCE jobs,
		this parameter is important. 

EXAMPLE

	% hhosts
	HOST    STATUS  MAX     MAP     REDUCE  FAILURE TMP_SPACE
	c0101   ok      6       6       0       0       218M
	c0102   ok      6       2       4       0       568M

AUTHOR
	Written by Shuguang Liu

REPORT BUGS
	Report bugs to 

COPYRIGHT	
	Copyright (C) 2005-2010 Shuguang Liu's Inc.

	This file is not a free software, all the source codes are protected
	and will not be released to any body or organization without 
	authority. 

SEE ALSO
	hload(1), hopen(1), hclose(1), hslot(1)

hhosts (hadoop utils) 13.0		October 2007 

_________________________________________________________________
手机也能上 MSN 聊天了,快来试试吧!
http://mobile.msn.com.cn/

Re: Enhancement to Hadoop

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Oct 12, 2007, at 1:48 AM, LiuShuguang wrote:

>
> Hello,
>
> I made some enhancement to hadoop based on 0.14. But I do not know  
> how to distribute to others. can you help me on that?

By the way, I think rather than moving jobs to the bottom of the  
queue, you should just change their priority. It would be much easier  
and job priorities are already implemented in 0.15.

-- Owen

Re: Enhancement to Hadoop

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Oct 12, 2007, at 1:48 AM, LiuShuguang wrote:

> I made some enhancement to hadoop based on 0.14. But I do not know  
> how to distribute to others. can you help me on that?

Please read:
http://wiki.apache.org/lucene-hadoop/HowToContribute

You'll need to make sure that it matches the coding standards  
including having the Apache headers at the top of each file and  
removing author lines.

-- Owen