You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Poole, Samuel [USA]" <po...@bah.com> on 2009/08/18 23:40:59 UTC

Hadoop for Independant Tasks not using Map/Reduce?

I am new to Hadoop (I have not yet installed/configured), and I want to make sure that I have the correct tool for the job.  I do not "currently" have a need for the Map/Reduce functionality, but I am interested in using Hadoop for task orchestration, task monitoring, etc. over numerous nodes in a computing cluster.  Our primary programs (written in C++ and launched via shell scripts) each run independantly on a single node, but are deployed to different nodes for load balancing.  I want to task/initiate these processes on different nodes through a Java program located on a central server.  I was hoping to use Hadoop as a foundation for this.

I read the following in the FAQ section:

"How do I use Hadoop Streaming to run an arbitrary set of (semi-)independent tasks?

Often you do not need the full power of Map Reduce, but only need to run multiple instances of the same program - either on different parts of the data, or on the same data, but with different parameters. You can use Hadoop Streaming to do this. "

So, two questions I guess.

1.  Can I use Hadoop for this purpose without using Map/Reduce functionality?

2.  Are there any examples available on how to implement this sort of configuration?

Any help would be greatly appreciated.

Sam

Re: Hadoop for Independant Tasks not using Map/Reduce?

Posted by Owen O'Malley <om...@apache.org>.

On Aug 18, 2009, at 2:40 PM, Poole, Samuel [USA] wrote:

> I am new to Hadoop (I have not yet installed/configured), and I want  
> to make sure that I have the correct tool for the job.  I do not  
> "currently" have a need for the Map/Reduce functionality, but I am  
> interested in using Hadoop for task orchestration, task monitoring,  
> etc. over numerous nodes in a computing cluster.  Our primary  
> programs (written in C++ and launched via shell scripts) each run  
> independantly on a single node, but are deployed to different nodes  
> for load balancing.  I want to task/initiate these processes on  
> different nodes through a Java program located on a central server.   
> I was hoping to use Hadoop as a foundation for this.

Just create a job with 0 reduces. The map tasks will run independently  
across the cluster.  Take a look at RandomWriter, which just writes a  
set of random data files.

-- Owen

Re: Hadoop for Independant Tasks not using Map/Reduce?

Posted by yang song <ha...@gmail.com>.

Hadoop streaming is the utility allows you to create and run Map/Reduce jobs
with any executable or script as the mapper and/or the reducer. I'm not
familiar with it, but I think you can find something useful here
http://hadoop.apache.org/common/docs/current/streaming.html

2009/8/19 Poole, Samuel [USA] <po...@bah.com>

> I am new to Hadoop (I have not yet installed/configured), and I want to
> make sure that I have the correct tool for the job.  I do not "currently"
> have a need for the Map/Reduce functionality, but I am interested in using
> Hadoop for task orchestration, task monitoring, etc. over numerous nodes in
> a computing cluster.  Our primary programs (written in C++ and launched via
> shell scripts) each run independantly on a single node, but are deployed to
> different nodes for load balancing.  I want to task/initiate these processes
> on different nodes through a Java program located on a central server.  I
> was hoping to use Hadoop as a foundation for this.
>
> I read the following in the FAQ section:
>
> "How do I use Hadoop Streaming to run an arbitrary set of
> (semi-)independent tasks?
>
> Often you do not need the full power of Map Reduce, but only need to run
> multiple instances of the same program - either on different parts of the
> data, or on the same data, but with different parameters. You can use Hadoop
> Streaming to do this. "
>
> So, two questions I guess.
>
> 1.  Can I use Hadoop for this purpose without using Map/Reduce
> functionality?
>
> 2.  Are there any examples available on how to implement this sort of
> configuration?
>
> Any help would be greatly appreciated.
>
> Sam
>
>
>
>
>
>
>