You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by vishalsant <Vi...@gmail.com> on 2010/01/22 22:12:00 UTC

Can I submit jobs to a hadoop cluster without using hadoop executable

I am a newbie to hadoop so please bear with me if this is naive.

I have defined a Mapper/Reducer and I desire to run it on a hadoop cluster
My question is

* Do I need to specify the Mapper/Reducer in the classpath of all my
DataNodes/JobTracker Node or can they be  uploaded to the cluster  as mobile
code. 

I would like 

* to define , my job in it's totality on a client JVM ( independent of the
cluster )
* compile it 
* run it's main method , with the conf pointing to the already
established/running cluster.

The Mapper/Reducer should be serialized  to the JobTracker JVM , the classes
deserialized  mapped to Mapper and Reducer and based on the input and output
arguments map() and reduce() should execute. 

Is that even possible? 

Or do I have I have to manually move over to an hadoop installation and
always execute it through the hadoop exec.

-- 
View this message in context: http://old.nabble.com/Can-I-submit-jobs-to-a-hadoop-cluster-without-using-hadoop-executable-tp27280041p27280041.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Can I submit jobs to a hadoop cluster without using hadoop executable

Posted by vishalsant <Vi...@gmail.com>.
I opted ( for the time being ) unjar, combine in a single job jar.
It is not painful , till u have a large number of jars , some thing like
this 


 <target name="JOB" depends="clobber" >
	 	    <property name="a.jar" value="${dist.dir}/lib/a.jar" />
	 	    <property name="b.jar" value="${dist.dir}/lib/b.jar" />
	 	    <property name="extracted" value="${dist.dir}/extracted" />
	 	    <mkdir dir="${extracted}" />
	 	    <unjar dest="${extracted}" src="${a.jar}" />
	 	    <unjar dest="${extracted}" src="${b.jar}" />
	 	    <jar jarfile="${dist.job.dir}/JOB.jar"
		         basedir="${build.classes.dir}">
		    	<fileset dir="${build.classes.dir}">
		    			<include name="**/*.class" />
		 		</fileset>
		    	<fileset dir="${extracted}">
		    			<include name="**/*.class" />
		    	</fileset>
	 	      <manifest>
		        <attribute name="Main-Class" 
		                value="myClass"/>
		      </manifest>
		    </jar>
	 	<delete dir="${extracted}" />
   </target>


* Adding . if folks need to see one way.
 

vishalsant wrote:
> 
> I am a newbie to hadoop so please bear with me if this is naive.
> 
> I have defined a Mapper/Reducer and I desire to run it on a hadoop cluster
> My question is
> 
> * Do I need to specify the Mapper/Reducer in the classpath of all my
> DataNodes/JobTracker Node or can they be  uploaded to the cluster  as
> mobile code. 
> 
> I would like 
> 
> * to define , my job in it's totality on a client JVM ( independent of the
> cluster )
> * compile it 
> * run it's main method , with the conf pointing to the already
> established/running cluster.
> 
> The Mapper/Reducer should be serialized  to the JobTracker JVM , the
> classes deserialized  mapped to Mapper and Reducer and based on the input
> and output arguments map() and reduce() should execute. 
> 
> Is that even possible? 
> 
> Or do I have I have to manually move over to an hadoop installation and
> always execute it through the hadoop exec.
> 
> 

-- 
View this message in context: http://old.nabble.com/Can-I-submit-jobs-to-a-hadoop-cluster-without-using-hadoop-executable-tp27280041p28389503.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Can I submit jobs to a hadoop cluster without using hadoop executable

Posted by Steve Loughran <st...@apache.org>.
vishalsant wrote:
> JobConf.setJar(..) might be the way , but that class is deprecated and no
> method in the Job has a corresponding addition.

1. Configuration.set("mapreduce.job.jar", JarName) is what you should 
use to set this. You can find the jar by using 
JobConf.findContainingJar(Class)

2. To submit jobs, you should have the Hadoop JAR(s) on your classpath, 
commons-logging and any other dependencies. For that I have
   commons-httpclient, commons-net, commons-codec, xmlenc, commons-cli,

3. JobClient.submitJob() shows you how to submit jobs; it wraps 
Job.submit() with some extra (optional) setup

> 


Re: Can I submit jobs to a hadoop cluster without using hadoop executable

Posted by vishalsant <Vi...@gmail.com>.
JobConf.setJar(..) might be the way , but that class is deprecated and no
method in the Job has a corresponding addition.



vishalsant wrote:
> 
> I am a newbie to hadoop so please bear with me if this is naive.
> 
> I have defined a Mapper/Reducer and I desire to run it on a hadoop cluster
> My question is
> 
> * Do I need to specify the Mapper/Reducer in the classpath of all my
> DataNodes/JobTracker Node or can they be  uploaded to the cluster  as
> mobile code. 
> 
> I would like 
> 
> * to define , my job in it's totality on a client JVM ( independent of the
> cluster )
> * compile it 
> * run it's main method , with the conf pointing to the already
> established/running cluster.
> 
> The Mapper/Reducer should be serialized  to the JobTracker JVM , the
> classes deserialized  mapped to Mapper and Reducer and based on the input
> and output arguments map() and reduce() should execute. 
> 
> Is that even possible? 
> 
> Or do I have I have to manually move over to an hadoop installation and
> always execute it through the hadoop exec.
> 
> 

-- 
View this message in context: http://old.nabble.com/Can-I-submit-jobs-to-a-hadoop-cluster-without-using-hadoop-executable-tp27280041p27280256.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.