You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Lior Schachter <li...@infolinks.com> on 2011/05/18 16:58:39 UTC

Running M/R jobs from java code

Hi,
I have my application installed on Tomcat and I wish to submit M/R jobs
programmatically.
Is there any standard way to do that ?

Thanks,
Lior

RE: Running M/R jobs from java code

Posted by Aaron Baff <Aa...@telescope.tv>.

The way I have it working is I have all of the MR Components (Mappers/Reducers/etc) in a separate Netbeans project, which results in a separate Jar file that I then include as part of the project which contains the daemon code.

The way I have the MR Components setup, is I have a job class which has some helper init() functions and such, and I use those from my daemon to setup the MR Job, and so in my job class, I use itself as the setJarByClass().

--Aaron
________________________________________
From: Lior Schachter [mailto:liors@infolinks.com]
Sent: Thursday, May 19, 2011 1:03 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Running M/R jobs from java code

Hi Aaron,
Thanks for your answer.
How should I specify the setJarByClass, since I don't have the jar file in my classpath (but rather on the namenode) ?
I see that I can set it explicitly with conf.set("mapred.jar",[FILE]), but I couldn't find a file format that works?

Lior

On Wed, May 18, 2011 at 8:18 PM, Aaron Baff <Aa...@telescope.tv> wrote:
It's not terribly hard to submit MR Job's. Create a hadoop Configuration object, and set it's fs.default.name and fs.defaultFS to the Namenode URI, and mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker URI. You can then easily setup and use a Job object (new API), or JobConf and JobClient (old API I think) to create and submit a MR Job, and then also monitor it's state and progress from within Java. You'll just need to make sure any 3rd part libraries that you require are within the job Jar, or on HDFS and you add it as part of the distributed caching mechanism to the MR Job.

W're doing this extensively, with a Java daemon using Thrift so our PHP UI can talk to the daemon and start reports, monitor them, and then once they are done retrieve the results. The daemon starts up all the MR Jobs in the necessary order to complete a report. Works quite generally speaking, at least for us.

--Aaron

-----Original Message-----
From: Joey Echeverria [mailto:joey@cloudera.com]
Sent: Wednesday, May 18, 2011 9:19 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Running M/R jobs from java code
Just last week I worked on a REST interface hosted in Tomcat that
launched a MR job. In my case, I included the jar with the job in the
WAR and called the run() method (the job implemented Tool). The only
tricky part is a copy of the Hadoop configuration files needed to be
in the classpath, but I just added those to the main Tomcat classpath.

The Tomcat server was not on the same node as any other cluster
machine, but there was no firewall between it and the cluster.

Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
create a home directory in HDFS. I could have probably called
set("user.name", "existing_user") in the configuration to avoid adding
the tomcat6 user.

-Joey

On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
<ge...@gmail.com> wrote:
> I am confronted with the same problem.  What I plan to do is to have a
> servlet simply execute a command on the machine from where I would start the
> job if I were running it from the command line.
>
> e.g.
> $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
>
> Another possibility would be to rig some kind of RMI thing.
>
> Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I might
> just give this a try.
>
> On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:
>>
>> Another machine in the cluster.
>>
>> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
>> <ge...@gmail.com> wrote:
>>>
>>> Is Tomcat installed on your hadoop name node? or another machine?
>>>
>>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
>>>>
>>>> Hi,
>>>> I have my application installed on Tomcat and I wish to submit M/R jobs
>>>> programmatically.
>>>> Is there any standard way to do that ?
>>>>
>>>> Thanks,
>>>> Lior
>>>
>>>
>>>
>>> --
>>> Geoffry Roberts
>>>
>>
>
>
>
> --
> Geoffry Roberts
>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Running M/R jobs from java code

Posted by Lior Schachter <li...@infolinks.com>.

Hi Again.
I keep getting java.lang.ClassNotFoundException on my mapper class:

11/05/19 16:10:22 INFO mapred.JobClient: Task Id :
attempt_201105051708_0111_m_000043_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
com.infolinks.hadoop.utils.CustomerTool$Finder
.

This is my code for launching M/R jobs from tomcat:
conf.set("mapred.jar", "/data/hadoop/jobs/customer-tool.jar");
grepJob.setJobName("customer-tool");
grepJob.getConfiguration().set("fs.default.name",
"hdfs://hadoop-master.infolinks.local:8000");
grepJob.getConfiguration().set("mapred.job.tracker",
"hadoop-master.infolinks.local:8020");
FileInputFormat.setInputPaths(grepJob, "/data/logs/2011-05-10");
grepJob.getConfiguration().set("mapreduce.map.class",
"com.infolinks.hadoop.utils.CustomerTool$Finder");
grepJob.getConfiguration().set("mapreduce.combine.class",
"com.infolinks.hadoop.utils.CustomerTool$Writer");
grepJob.getConfiguration().set("mapreduce.reduce.class",
"com.infolinks.hadoop.utils.CustomerTool$Writer");
Path output = new Path("/data/output_jobs/grep/101");
FileOutputFormat.setOutputPath(grepJob, output);
grepJob.setOutputKeyClass(IntWritable.class);
grepJob.setOutputValueClass(FloatWritable.class);
grepJob.waitForCompletion(true);

I don't have the job jar on my classpath...

Lior

On Thu, May 19, 2011 at 11:03 AM, Lior Schachter <li...@infolinks.com>wrote:

> Hi Aaron,
> Thanks for your answer.
> How should I specify the setJarByClass, since I don't have the jar file in
> my classpath (but rather on the namenode) ?
> I see that I can set it explicitly with conf.set("mapred.jar",[FILE]), but
> I couldn't find a file format that works?
>
> Lior
>
>
>
> On Wed, May 18, 2011 at 8:18 PM, Aaron Baff <Aa...@telescope.tv>wrote:
>
>> It's not terribly hard to submit MR Job's. Create a hadoop Configuration
>> object, and set it's fs.default.name and fs.defaultFS to the Namenode
>> URI, and mapreduce.jobtracker.address and mapred.job.tracker to the
>> JobTracker URI. You can then easily setup and use a Job object (new API), or
>> JobConf and JobClient (old API I think) to create and submit a MR Job, and
>> then also monitor it's state and progress from within Java. You'll just need
>> to make sure any 3rd part libraries that you require are within the job Jar,
>> or on HDFS and you add it as part of the distributed caching mechanism to
>> the MR Job.
>>
>> W're doing this extensively, with a Java daemon using Thrift so our PHP UI
>> can talk to the daemon and start reports, monitor them, and then once they
>> are done retrieve the results. The daemon starts up all the MR Jobs in the
>> necessary order to complete a report. Works quite generally speaking, at
>> least for us.
>>
>> --Aaron
>>
>> -----Original Message-----
>> From: Joey Echeverria [mailto:joey@cloudera.com]
>> Sent: Wednesday, May 18, 2011 9:19 AM
>> To: mapreduce-user@hadoop.apache.org
>> Subject: Re: Running M/R jobs from java code
>>
>> Just last week I worked on a REST interface hosted in Tomcat that
>> launched a MR job. In my case, I included the jar with the job in the
>> WAR and called the run() method (the job implemented Tool). The only
>> tricky part is a copy of the Hadoop configuration files needed to be
>> in the classpath, but I just added those to the main Tomcat classpath.
>>
>> The Tomcat server was not on the same node as any other cluster
>> machine, but there was no firewall between it and the cluster.
>>
>> Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
>> create a home directory in HDFS. I could have probably called
>> set("user.name", "existing_user") in the configuration to avoid adding
>> the tomcat6 user.
>>
>> -Joey
>>
>> On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
>> <ge...@gmail.com> wrote:
>> > I am confronted with the same problem.  What I plan to do is to have a
>> > servlet simply execute a command on the machine from where I would start
>> the
>> > job if I were running it from the command line.
>> >
>> > e.g.
>> > $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
>> >
>> > Another possibility would be to rig some kind of RMI thing.
>> >
>> > Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I
>> might
>> > just give this a try.
>> >
>> > On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:
>> >>
>> >> Another machine in the cluster.
>> >>
>> >> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
>> >> <ge...@gmail.com> wrote:
>> >>>
>> >>> Is Tomcat installed on your hadoop name node? or another machine?
>> >>>
>> >>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> I have my application installed on Tomcat and I wish to submit M/R
>> jobs
>> >>>> programmatically.
>> >>>> Is there any standard way to do that ?
>> >>>>
>> >>>> Thanks,
>> >>>> Lior
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Geoffry Roberts
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Geoffry Roberts
>> >
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>

Re: Running M/R jobs from java code

Posted by Lior Schachter <li...@infolinks.com>.

Hi Aaron,
Thanks for your answer.
How should I specify the setJarByClass, since I don't have the jar file in
my classpath (but rather on the namenode) ?
I see that I can set it explicitly with conf.set("mapred.jar",[FILE]), but I
couldn't find a file format that works?

Lior


On Wed, May 18, 2011 at 8:18 PM, Aaron Baff <Aa...@telescope.tv> wrote:

> It's not terribly hard to submit MR Job's. Create a hadoop Configuration
> object, and set it's fs.default.name and fs.defaultFS to the Namenode URI,
> and mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker
> URI. You can then easily setup and use a Job object (new API), or JobConf
> and JobClient (old API I think) to create and submit a MR Job, and then also
> monitor it's state and progress from within Java. You'll just need to make
> sure any 3rd part libraries that you require are within the job Jar, or on
> HDFS and you add it as part of the distributed caching mechanism to the MR
> Job.
>
> W're doing this extensively, with a Java daemon using Thrift so our PHP UI
> can talk to the daemon and start reports, monitor them, and then once they
> are done retrieve the results. The daemon starts up all the MR Jobs in the
> necessary order to complete a report. Works quite generally speaking, at
> least for us.
>
> --Aaron
>
> -----Original Message-----
> From: Joey Echeverria [mailto:joey@cloudera.com]
> Sent: Wednesday, May 18, 2011 9:19 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Running M/R jobs from java code
>
> Just last week I worked on a REST interface hosted in Tomcat that
> launched a MR job. In my case, I included the jar with the job in the
> WAR and called the run() method (the job implemented Tool). The only
> tricky part is a copy of the Hadoop configuration files needed to be
> in the classpath, but I just added those to the main Tomcat classpath.
>
> The Tomcat server was not on the same node as any other cluster
> machine, but there was no firewall between it and the cluster.
>
> Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
> create a home directory in HDFS. I could have probably called
> set("user.name", "existing_user") in the configuration to avoid adding
> the tomcat6 user.
>
> -Joey
>
> On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
> <ge...@gmail.com> wrote:
> > I am confronted with the same problem.  What I plan to do is to have a
> > servlet simply execute a command on the machine from where I would start
> the
> > job if I were running it from the command line.
> >
> > e.g.
> > $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
> >
> > Another possibility would be to rig some kind of RMI thing.
> >
> > Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I
> might
> > just give this a try.
> >
> > On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:
> >>
> >> Another machine in the cluster.
> >>
> >> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
> >> <ge...@gmail.com> wrote:
> >>>
> >>> Is Tomcat installed on your hadoop name node? or another machine?
> >>>
> >>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
> >>>>
> >>>> Hi,
> >>>> I have my application installed on Tomcat and I wish to submit M/R
> jobs
> >>>> programmatically.
> >>>> Is there any standard way to do that ?
> >>>>
> >>>> Thanks,
> >>>> Lior
> >>>
> >>>
> >>>
> >>> --
> >>> Geoffry Roberts
> >>>
> >>
> >
> >
> >
> > --
> > Geoffry Roberts
> >
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

RE: Running M/R jobs from java code

Posted by Aaron Baff <Aa...@telescope.tv>.

Geoffry,

Basically it's replicating what you do in the main() method, and then just making sure you give it a Configuration (or get one via Job.getConfiguration()) with those parameters. I forget which ones are for the old API and which are for the new, but I just set both just in case.

See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/conf/Configuration.html, which also mentions the default config setup (http://hadoop.apache.org/common/docs/current/core-default.html), and you can override just about any of those when you submit a MR Job. Generally you leave most of those alone, but there may be times you want to.

--Aaron
________________________________________
From: Geoffry Roberts [mailto:geoffry.roberts@gmail.com]
Sent: Wednesday, May 18, 2011 10:24 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Running M/R jobs from java code

Aaron,

I didn't know one could do this thanks. I'll give it a try.
On 18 May 2011 10:18, Aaron Baff <Aa...@telescope.tv> wrote:
It's not terribly hard to submit MR Job's. Create a hadoop Configuration object, and set it's fs.default.name and fs.defaultFS to the Namenode URI, and mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker URI. You can then easily setup and use a Job object (new API), or JobConf and JobClient (old API I think) to create and submit a MR Job, and then also monitor it's state and progress from within Java. You'll just need to make sure any 3rd part libraries that you require are within the job Jar, or on HDFS and you add it as part of the distributed caching mechanism to the MR Job.

W're doing this extensively, with a Java daemon using Thrift so our PHP UI can talk to the daemon and start reports, monitor them, and then once they are done retrieve the results. The daemon starts up all the MR Jobs in the necessary order to complete a report. Works quite generally speaking, at least for us.

--Aaron

-----Original Message-----
From: Joey Echeverria [mailto:joey@cloudera.com]
Sent: Wednesday, May 18, 2011 9:19 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Running M/R jobs from java code

Just last week I worked on a REST interface hosted in Tomcat that
launched a MR job. In my case, I included the jar with the job in the
WAR and called the run() method (the job implemented Tool). The only
tricky part is a copy of the Hadoop configuration files needed to be
in the classpath, but I just added those to the main Tomcat classpath.

The Tomcat server was not on the same node as any other cluster
machine, but there was no firewall between it and the cluster.

Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
create a home directory in HDFS. I could have probably called
set("user.name", "existing_user") in the configuration to avoid adding
the tomcat6 user.

-Joey

On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
<ge...@gmail.com> wrote:
> I am confronted with the same problem.  What I plan to do is to have a
> servlet simply execute a command on the machine from where I would start the
> job if I were running it from the command line.
>
> e.g.
> $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
>
> Another possibility would be to rig some kind of RMI thing.
>
> Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I might
> just give this a try.
>
> On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:
>>
>> Another machine in the cluster.
>>
>> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
>> <ge...@gmail.com> wrote:
>>>
>>> Is Tomcat installed on your hadoop name node? or another machine?
>>>
>>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
>>>>
>>>> Hi,
>>>> I have my application installed on Tomcat and I wish to submit M/R jobs
>>>> programmatically.
>>>> Is there any standard way to do that ?
>>>>
>>>> Thanks,
>>>> Lior
>>>
>>>
>>>
>>> --
>>> Geoffry Roberts
>>>
>>
>
>
>
> --
> Geoffry Roberts
>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

--
Geoffry Roberts

Re: Running M/R jobs from java code

Posted by Geoffry Roberts <ge...@gmail.com>.

Aaron,

I didn't know one could do this thanks. I'll give it a try.

On 18 May 2011 10:18, Aaron Baff <Aa...@telescope.tv> wrote:

> It's not terribly hard to submit MR Job's. Create a hadoop Configuration
> object, and set it's fs.default.name and fs.defaultFS to the Namenode URI,
> and mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker
> URI. You can then easily setup and use a Job object (new API), or JobConf
> and JobClient (old API I think) to create and submit a MR Job, and then also
> monitor it's state and progress from within Java. You'll just need to make
> sure any 3rd part libraries that you require are within the job Jar, or on
> HDFS and you add it as part of the distributed caching mechanism to the MR
> Job.
>
> W're doing this extensively, with a Java daemon using Thrift so our PHP UI
> can talk to the daemon and start reports, monitor them, and then once they
> are done retrieve the results. The daemon starts up all the MR Jobs in the
> necessary order to complete a report. Works quite generally speaking, at
> least for us.
>
> --Aaron
>
> -----Original Message-----
> From: Joey Echeverria [mailto:joey@cloudera.com]
> Sent: Wednesday, May 18, 2011 9:19 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Running M/R jobs from java code
>
> Just last week I worked on a REST interface hosted in Tomcat that
> launched a MR job. In my case, I included the jar with the job in the
> WAR and called the run() method (the job implemented Tool). The only
> tricky part is a copy of the Hadoop configuration files needed to be
> in the classpath, but I just added those to the main Tomcat classpath.
>
> The Tomcat server was not on the same node as any other cluster
> machine, but there was no firewall between it and the cluster.
>
> Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
> create a home directory in HDFS. I could have probably called
> set("user.name", "existing_user") in the configuration to avoid adding
> the tomcat6 user.
>
> -Joey
>
> On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
> <ge...@gmail.com> wrote:
> > I am confronted with the same problem.  What I plan to do is to have a
> > servlet simply execute a command on the machine from where I would start
> the
> > job if I were running it from the command line.
> >
> > e.g.
> > $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
> >
> > Another possibility would be to rig some kind of RMI thing.
> >
> > Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I
> might
> > just give this a try.
> >
> > On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:
> >>
> >> Another machine in the cluster.
> >>
> >> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
> >> <ge...@gmail.com> wrote:
> >>>
> >>> Is Tomcat installed on your hadoop name node? or another machine?
> >>>
> >>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
> >>>>
> >>>> Hi,
> >>>> I have my application installed on Tomcat and I wish to submit M/R
> jobs
> >>>> programmatically.
> >>>> Is there any standard way to do that ?
> >>>>
> >>>> Thanks,
> >>>> Lior
> >>>
> >>>
> >>>
> >>> --
> >>> Geoffry Roberts
> >>>
> >>
> >
> >
> >
> > --
> > Geoffry Roberts
> >
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>



-- 
Geoffry Roberts

RE: Running M/R jobs from java code

Posted by Aaron Baff <Aa...@telescope.tv>.

It's not terribly hard to submit MR Job's. Create a hadoop Configuration object, and set it's fs.default.name and fs.defaultFS to the Namenode URI, and mapreduce.jobtracker.address and mapred.job.tracker to the JobTracker URI. You can then easily setup and use a Job object (new API), or JobConf and JobClient (old API I think) to create and submit a MR Job, and then also monitor it's state and progress from within Java. You'll just need to make sure any 3rd part libraries that you require are within the job Jar, or on HDFS and you add it as part of the distributed caching mechanism to the MR Job.

W're doing this extensively, with a Java daemon using Thrift so our PHP UI can talk to the daemon and start reports, monitor them, and then once they are done retrieve the results. The daemon starts up all the MR Jobs in the necessary order to complete a report. Works quite generally speaking, at least for us.

--Aaron

-----Original Message-----
From: Joey Echeverria [mailto:joey@cloudera.com]
Sent: Wednesday, May 18, 2011 9:19 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Running M/R jobs from java code

Just last week I worked on a REST interface hosted in Tomcat that
launched a MR job. In my case, I included the jar with the job in the
WAR and called the run() method (the job implemented Tool). The only
tricky part is a copy of the Hadoop configuration files needed to be
in the classpath, but I just added those to the main Tomcat classpath.

The Tomcat server was not on the same node as any other cluster
machine, but there was no firewall between it and the cluster.

Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
create a home directory in HDFS. I could have probably called
set("user.name", "existing_user") in the configuration to avoid adding
the tomcat6 user.

-Joey

On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
<ge...@gmail.com> wrote:
> I am confronted with the same problem.  What I plan to do is to have a
> servlet simply execute a command on the machine from where I would start the
> job if I were running it from the command line.
>
> e.g.
> $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
>
> Another possibility would be to rig some kind of RMI thing.
>
> Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I might
> just give this a try.
>
> On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:
>>
>> Another machine in the cluster.
>>
>> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
>> <ge...@gmail.com> wrote:
>>>
>>> Is Tomcat installed on your hadoop name node? or another machine?
>>>
>>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
>>>>
>>>> Hi,
>>>> I have my application installed on Tomcat and I wish to submit M/R jobs
>>>> programmatically.
>>>> Is there any standard way to do that ?
>>>>
>>>> Thanks,
>>>> Lior
>>>
>>>
>>>
>>> --
>>> Geoffry Roberts
>>>
>>
>
>
>
> --
> Geoffry Roberts
>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Running M/R jobs from java code

Posted by Joey Echeverria <jo...@cloudera.com>.

Just last week I worked on a REST interface hosted in Tomcat that
launched a MR job. In my case, I included the jar with the job in the
WAR and called the run() method (the job implemented Tool). The only
tricky part is a copy of the Hadoop configuration files needed to be
in the classpath, but I just added those to the main Tomcat classpath.

The Tomcat server was not on the same node as any other cluster
machine, but there was no firewall between it and the cluster.

Oh, I also had to create a tomcat6 user on the namenode/jobtracker and
create a home directory in HDFS. I could have probably called
set("user.name", "existing_user") in the configuration to avoid adding
the tomcat6 user.

-Joey

On Wed, May 18, 2011 at 8:37 AM, Geoffry Roberts
<ge...@gmail.com> wrote:
> I am confronted with the same problem.  What I plan to do is to have a
> servlet simply execute a command on the machine from where I would start the
> job if I were running it from the command line.
>
> e.g.
> $ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'
>
> Another possibility would be to rig some kind of RMI thing.
>
> Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I might
> just give this a try.
>
> On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:
>>
>> Another machine in the cluster.
>>
>> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
>> <ge...@gmail.com> wrote:
>>>
>>> Is Tomcat installed on your hadoop name node? or another machine?
>>>
>>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
>>>>
>>>> Hi,
>>>> I have my application installed on Tomcat and I wish to submit M/R jobs
>>>> programmatically.
>>>> Is there any standard way to do that ?
>>>>
>>>> Thanks,
>>>> Lior
>>>
>>>
>>>
>>> --
>>> Geoffry Roberts
>>>
>>
>
>
>
> --
> Geoffry Roberts
>
>

-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Running M/R jobs from java code

Posted by Geoffry Roberts <ge...@gmail.com>.

I am confronted with the same problem.  What I plan to do is to have a
servlet simply execute a command on the machine from where I would start the
job if I were running it from the command line.

e.g.
$ ssh <remote host> '<hadoop_home>/bin/hadoop jar myjob.jar'

Another possibility would be to rig some kind of RMI thing.

Now here's an idea:  Use an aglet. ;-)  If I get into a funky mood I might
just give this a try.

On 18 May 2011 08:07, Lior Schachter <li...@infolinks.com> wrote:

> Another machine in the cluster.
>
>
> On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com> wrote:
>
>> Is Tomcat installed on your hadoop name node? or another machine?
>>
>>
>> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
>>
>>> Hi,
>>> I have my application installed on Tomcat and I wish to submit M/R jobs
>>> programmatically.
>>> Is there any standard way to do that ?
>>>
>>> Thanks,
>>> Lior
>>>
>>
>>
>>
>> --
>> Geoffry Roberts
>>
>>
>

-- 
Geoffry Roberts

Re: Running M/R jobs from java code

Posted by Lior Schachter <li...@infolinks.com>.

Another machine in the cluster.

On Wed, May 18, 2011 at 6:05 PM, Geoffry Roberts
<ge...@gmail.com>wrote:

> Is Tomcat installed on your hadoop name node? or another machine?
>
>
> On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:
>
>> Hi,
>> I have my application installed on Tomcat and I wish to submit M/R jobs
>> programmatically.
>> Is there any standard way to do that ?
>>
>> Thanks,
>> Lior
>>
>
>
>
> --
> Geoffry Roberts
>
>

Re: Running M/R jobs from java code

Posted by Geoffry Roberts <ge...@gmail.com>.

Is Tomcat installed on your hadoop name node? or another machine?

On 18 May 2011 07:58, Lior Schachter <li...@infolinks.com> wrote:

> Hi,
> I have my application installed on Tomcat and I wish to submit M/R jobs
> programmatically.
> Is there any standard way to do that ?
>
> Thanks,
> Lior
>



-- 
Geoffry Roberts