You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Ankit Gandhi <an...@gmail.com> on 2010/10/25 06:42:15 UTC

Running jar files inside map task

Hey,
I want to know whether can I run a jar file inside a map task or not because
I have to use the output of that file in my map task.
I am able to run it in standalone mode but it fails in psuedo-distributed
mode.
Thanks in advance

-- 
Ankit Gandhi
Undergraduate
Center for Visual Information Technology
Computer Science Engineering & Dual Degree
IIIT-Hyderabad

RE: Running jar files inside map task

Posted by Michael Segel <mi...@hotmail.com>.
Uhm... just to take this in a different direction ...

Yes, you can run a jar file within your hadoop m/r job. 
You may end up with class loader problems though, depending on what's in the jar.

I think what I'm talking about goes beyond the concept of chaining jobs, so I'm just tossing it out there as a response to your question ...

HTH

-Mike


> From: qwertymaniac@gmail.com
> Date: Wed, 27 Oct 2010 15:21:28 +0530
> Subject: Re: Running jar files inside map task
> To: general@hadoop.apache.org
> 
> On Wed, Oct 27, 2010 at 12:52 PM, gaurav bagga <ga...@gmail.com> wrote:
> > It would be great if you could tell or point me to an article which uses the
> > output of first map reduce as input for the 2nd map reduce.
> 
> There's ChainMapper and ChainReducer that let you do [MAP+ / REDUCE
> MAP*] kind of job configuration (single job).
> 
> If you wish to chain jobs (To look like MRMRMR-ish), look at
> o.a.h.mapred.jobcontrol.Job at:
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/jobcontrol/Job.html
> 
> Specifically, look at Job.addDependingJob.
> 
> >
> > -Gaurav
> >
> >
> >
> > On Tue, Oct 26, 2010 at 7:02 PM, Kumar Harshit <hk...@gmail.com>wrote:
> >
> >> You can create 2nd map reduce job. The input to the mapper of 2nd Map
> >> Reduce
> >> job is the output of 1st Map Reduce job. This way you can tackle the issue.
> >>
> >> Hope it helps
> >>
> >> Kumar
> >>
> >> On Mon, Oct 25, 2010 at 1:42 PM, Ankit Gandhi <an...@gmail.com>
> >> wrote:
> >>
> >> > Hey,
> >> > I want to know whether can I run a jar file inside a map task or not
> >> > because
> >> > I have to use the output of that file in my map task.
> >> > I am able to run it in standalone mode but it fails in psuedo-distributed
> >> > mode.
> >> > Thanks in advance
> >> >
> >> > --
> >> > Ankit Gandhi
> >> > Undergraduate
> >> > Center for Visual Information Technology
> >> > Computer Science Engineering & Dual Degree
> >> > IIIT-Hyderabad
> >> >
> >>
> >
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com
 		 	   		  

Re: Running jar files inside map task

Posted by Harsh J <qw...@gmail.com>.
On Wed, Oct 27, 2010 at 12:52 PM, gaurav bagga <ga...@gmail.com> wrote:
> It would be great if you could tell or point me to an article which uses the
> output of first map reduce as input for the 2nd map reduce.

There's ChainMapper and ChainReducer that let you do [MAP+ / REDUCE
MAP*] kind of job configuration (single job).

If you wish to chain jobs (To look like MRMRMR-ish), look at
o.a.h.mapred.jobcontrol.Job at:
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/jobcontrol/Job.html

Specifically, look at Job.addDependingJob.

>
> -Gaurav
>
>
>
> On Tue, Oct 26, 2010 at 7:02 PM, Kumar Harshit <hk...@gmail.com>wrote:
>
>> You can create 2nd map reduce job. The input to the mapper of 2nd Map
>> Reduce
>> job is the output of 1st Map Reduce job. This way you can tackle the issue.
>>
>> Hope it helps
>>
>> Kumar
>>
>> On Mon, Oct 25, 2010 at 1:42 PM, Ankit Gandhi <an...@gmail.com>
>> wrote:
>>
>> > Hey,
>> > I want to know whether can I run a jar file inside a map task or not
>> > because
>> > I have to use the output of that file in my map task.
>> > I am able to run it in standalone mode but it fails in psuedo-distributed
>> > mode.
>> > Thanks in advance
>> >
>> > --
>> > Ankit Gandhi
>> > Undergraduate
>> > Center for Visual Information Technology
>> > Computer Science Engineering & Dual Degree
>> > IIIT-Hyderabad
>> >
>>
>



-- 
Harsh J
www.harshj.com

Re: Running jar files inside map task

Posted by Kumar Harshit <hk...@gmail.com>.
I think this is not difficult. Let me give an example here.

There are two Map Reduce job, MR1 and MR2.
MR1 has Map1 and Reduce1 as mapper and reducer.
MR2 has Map2 and Reduce2 as mapper and reducer.

MR1 input directory is Input1 and output directory is Output1
MR2 input directory is Output1 (*this makes sure that Output of MR1 is input
to MR2*) and output directory is MR2.

In the main() function of MR1, specify the paths of input directory and
output directory:
setInputPaths(new Path("Input1"));
setOutputPaths(new Path("Output1"));

In the main() function of MR2, specify the path of input directory and
output directory
*setInputPaths(new Path("Output1")); //Note that, the Output directory of
Previous MR1 is now the input directory o*
setOutputPaths(new Path("Output2"));

Try to search on net for more examples. there are plenty of them.

Best
Kumar
On Wed, Oct 27, 2010 at 3:22 AM, gaurav bagga <ga...@gmail.com> wrote:

> It would be great if you could tell or point me to an article which uses
> the
> output of first map reduce as input for the 2nd map reduce.
>
> -Gaurav
>
>
>
> On Tue, Oct 26, 2010 at 7:02 PM, Kumar Harshit <hkumar.arora@gmail.com
> >wrote:
>
> > You can create 2nd map reduce job. The input to the mapper of 2nd Map
> > Reduce
> > job is the output of 1st Map Reduce job. This way you can tackle the
> issue.
> >
> > Hope it helps
> >
> > Kumar
> >
> > On Mon, Oct 25, 2010 at 1:42 PM, Ankit Gandhi <an...@gmail.com>
> > wrote:
> >
> > > Hey,
> > > I want to know whether can I run a jar file inside a map task or not
> > > because
> > > I have to use the output of that file in my map task.
> > > I am able to run it in standalone mode but it fails in
> psuedo-distributed
> > > mode.
> > > Thanks in advance
> > >
> > > --
> > > Ankit Gandhi
> > > Undergraduate
> > > Center for Visual Information Technology
> > > Computer Science Engineering & Dual Degree
> > > IIIT-Hyderabad
> > >
> >
>

Re: Running jar files inside map task

Posted by gaurav bagga <ga...@gmail.com>.
It would be great if you could tell or point me to an article which uses the
output of first map reduce as input for the 2nd map reduce.

-Gaurav



On Tue, Oct 26, 2010 at 7:02 PM, Kumar Harshit <hk...@gmail.com>wrote:

> You can create 2nd map reduce job. The input to the mapper of 2nd Map
> Reduce
> job is the output of 1st Map Reduce job. This way you can tackle the issue.
>
> Hope it helps
>
> Kumar
>
> On Mon, Oct 25, 2010 at 1:42 PM, Ankit Gandhi <an...@gmail.com>
> wrote:
>
> > Hey,
> > I want to know whether can I run a jar file inside a map task or not
> > because
> > I have to use the output of that file in my map task.
> > I am able to run it in standalone mode but it fails in psuedo-distributed
> > mode.
> > Thanks in advance
> >
> > --
> > Ankit Gandhi
> > Undergraduate
> > Center for Visual Information Technology
> > Computer Science Engineering & Dual Degree
> > IIIT-Hyderabad
> >
>

Re: Running jar files inside map task

Posted by Kumar Harshit <hk...@gmail.com>.
You can create 2nd map reduce job. The input to the mapper of 2nd Map Reduce
job is the output of 1st Map Reduce job. This way you can tackle the issue.

Hope it helps

Kumar

On Mon, Oct 25, 2010 at 1:42 PM, Ankit Gandhi <an...@gmail.com> wrote:

> Hey,
> I want to know whether can I run a jar file inside a map task or not
> because
> I have to use the output of that file in my map task.
> I am able to run it in standalone mode but it fails in psuedo-distributed
> mode.
> Thanks in advance
>
> --
> Ankit Gandhi
> Undergraduate
> Center for Visual Information Technology
> Computer Science Engineering & Dual Degree
> IIIT-Hyderabad
>

Re: Running jar files inside map task

Posted by Harsh J <qw...@gmail.com>.
Is DistributedCache what you're looking for?
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.html

On Mon, Oct 25, 2010 at 10:12 AM, Ankit Gandhi <an...@gmail.com> wrote:
> Hey,
> I want to know whether can I run a jar file inside a map task or not because
> I have to use the output of that file in my map task.
> I am able to run it in standalone mode but it fails in psuedo-distributed
> mode.
> Thanks in advance
>
> --
> Ankit Gandhi
> Undergraduate
> Center for Visual Information Technology
> Computer Science Engineering & Dual Degree
> IIIT-Hyderabad
>



-- 
Harsh J
www.harshj.com