You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Daniel Yehdego <dt...@miners.utep.edu> on 2011/10/25 22:53:14 UTC

Job Submission schedule, one file at a time ?

Hi, 
I do have a folder with 50 different files and and I want to submit a Hadoop MapReduce job using each file as an input.My Map/Reduce programs basically do the same job for each of my files but I want to schedule and submit a job one file at a time. Its like submitting a job with one file input, wait until the job completes and submit the second job (second file) right after.I want to have 50 different Mapreduce outputs for the 50 input files. 
Looking forward for your inputs , Thanks.
Regards, 


 		 	   		  

RE: Job Submission schedule, one file at a time ?

Posted by Daniel Yehdego <dt...@miners.utep.edu>.
Hi Mike, Thanks for your quick response.
What I am looking is, I have m/r job which accepts an input file, process it and outputs to a single reducer.But I do have many file ques waiting to be processed similarly and I don't want to submit each m/r job foreach file manually. Is there a way to submit these files one after another or concurrently. All the files are independentto each other. I think its kind of job scheduler or something ? I am not sure on how to proceed .

Regards, 


> Subject: Re: Job Submission schedule, one file at a time ?
> From: michael_segel@hotmail.com
> Date: Tue, 25 Oct 2011 16:25:40 -0500
> To: common-user@hadoop.apache.org
> 
> Not sure what you are attempting to do...
> If you submit the directory name... You get a single m/r job to process all. 
> ( but it doesn't sound like that is what you want...)
> 
> You could use Oozie, or just a simple shell script that will walk down a list of files in the directory and then launch a Hadoop task...
> 
> Or did you want something else?
> 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Oct 25, 2011, at 3:53 PM, Daniel Yehdego <dt...@miners.utep.edu> wrote:
> 
> > 
> > Hi, 
> > I do have a folder with 50 different files and and I want to submit a Hadoop MapReduce job using each file as an input.My Map/Reduce programs basically do the same job for each of my files but I want to schedule and submit a job one file at a time. Its like submitting a job with one file input, wait until the job completes and submit the second job (second file) right after.I want to have 50 different Mapreduce outputs for the 50 input files. 
> > Looking forward for your inputs , Thanks.
> > Regards, 
> > 
> > 
> >                         
 		 	   		  

Re: Job Submission schedule, one file at a time ?

Posted by Michel Segel <mi...@hotmail.com>.
Not sure what you are attempting to do...
If you submit the directory name... You get a single m/r job to process all. 
( but it doesn't sound like that is what you want...)

You could use Oozie, or just a simple shell script that will walk down a list of files in the directory and then launch a Hadoop task...

Or did you want something else?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Oct 25, 2011, at 3:53 PM, Daniel Yehdego <dt...@miners.utep.edu> wrote:

> 
> Hi, 
> I do have a folder with 50 different files and and I want to submit a Hadoop MapReduce job using each file as an input.My Map/Reduce programs basically do the same job for each of my files but I want to schedule and submit a job one file at a time. Its like submitting a job with one file input, wait until the job completes and submit the second job (second file) right after.I want to have 50 different Mapreduce outputs for the 50 input files. 
> Looking forward for your inputs , Thanks.
> Regards, 
> 
> 
>