You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Robinson <ha...@gmail.com> on 2010/05/29 20:31:00 UTC

calling C programs from Hadoop

I am new to Hadoop. I have successfully run java programs from Hadoop and I
would like to call C programs from Hadoop.

Thank you for your help

Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p854833.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Michael Robinson <ha...@gmail.com>.
Asif,

Thanks very much for your help, I found and downloaded hadoop streaming

Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855163.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Michael Robinson <ha...@gmail.com>.
Jeff,

Reading "Hadoof Streaming" I found the following:

"How Does Streaming Work
In the above example, both the mapper and the reducer are executables that
read the input from stdin (line by line) and emit the output to stdout. The
utility will create a Map/Reduce job, submit the job to an appropriate
cluster, and monitor the progress of the job until it completes.
"

I am beginning to think that my understanding of map/reduce is faulty. At
this time I understand that the mapper takes in data and splits it into
chunks creating lists of  (<key>, <values>), then it combines this output
and sends the result to the reducer.

The C program I have reads each line in the input file and searches a master
file looking for exact and similar matches then it does computations bases
on how similar the results are, so there is no need for creating <key>,
<values> lists.


Thanks very much

Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p859041.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Jeff Bean <jw...@cloudera.com>.
Hi Michael,

Why did you determine that Hadoop streaming was insufficient for you?

Jeff

On Mon, May 31, 2010 at 9:17 AM, Michael Robinson
<ha...@gmail.com>wrote:

>
> Hi Jef,
>
> I have a C program that processes very large data files which are
> compressed, so this program has to have full control of the process.
> However
> the input data can be broken down into chunks, and a separate (distributed)
> process for each chunk can be run, which what I am doing now, but I am
> doing
> this manually at this time.
>
> I am looking to use a distributed system like Hadoop to do this so that i
> controls the scheduling and all those great things I have read about
> Hadoop.
>
> I was wondering if I can have Hadoop run a batch file (.bat in windows or
> .sh in linux), also I would like to run this in Virtual Machines.
>
> Thanks
>
>
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p858959.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Re: calling C programs from Hadoop

Posted by Michael Robinson <ha...@gmail.com>.
Hi Jef,

I have a C program that processes very large data files which are
compressed, so this program has to have full control of the process. However
the input data can be broken down into chunks, and a separate (distributed)
process for each chunk can be run, which what I am doing now, but I am doing
this manually at this time.

I am looking to use a distributed system like Hadoop to do this so that i
controls the scheduling and all those great things I have read about Hadoop.

I was wondering if I can have Hadoop run a batch file (.bat in windows or
.sh in linux), also I would like to run this in Virtual Machines.

Thanks


Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p858959.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Jeff Bean <jw...@cloudera.com>.
Hi Michael,

How come you can't specify the C program as the mapper in streaming and just
have no reducers?

Jeff

On Sat, May 29, 2010 at 6:14 PM, Michael Robinson
<ha...@gmail.com>wrote:

>
> Thanks for your answers.
>
> I have read "hadoop streaming" and I think it is great, however what I am
> trying to do is to run a C program that I have with its own data, and have
> hadoop do the scheduling and make it run in multiple nodes as a distributed
> system.
>
> The process I need to do does NOT do map and reduce type of process, so
> what
> I was thinking was either feed the C program to Hadoop or write a java
> program that would call the C program and have Hadoop do its magic.
>
> Thanks
>
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855338.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Re: calling C programs from Hadoop

Posted by Michael Robinson <ha...@gmail.com>.
Thanks for your answers.

I have read "hadoop streaming" and I think it is great, however what I am
trying to do is to run a C program that I have with its own data, and have
hadoop do the scheduling and make it run in multiple nodes as a distributed
system.

The process I need to do does NOT do map and reduce type of process, so what
I was thinking was either feed the C program to Hadoop or write a java
program that would call the C program and have Hadoop do its magic.

Thanks

Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855338.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Michael Robinson <ha...@gmail.com>.
Owen,

Where do I find information about PIPES

Thanks much

Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855152.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Owen O'Malley <om...@apache.org>.
On Sat, May 29, 2010 at 12:52 PM, Asif Jan <As...@unige.ch> wrote:
> Look at Hadoop streaming, may be it is helpful to you.

There is also Pipes, which is the C++ interface to MapReduce.

-- Owen

Re: calling C programs from Hadoop

Posted by Michael Robinson <ha...@gmail.com>.
Hi Brian,

Yes, it is a batch process.

I am using Ubuntu Linux, can you tell me how to open the p7s file you send
me?

I googled for p7s viewer and it seems they work on windows and mac only.

Thanks

Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p858867.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Uh...

So you want a batch system?  Look up PBS (Torque/Maui), SGE, or Condor.

Brian

On May 29, 2010, at 8:17 PM, Michael Robinson wrote:

> 
> Thanks for your answers.
> 
> I have read "hadoop streaming" and I think it is great, however what I am
> trying to do is to run a C program that I have with its own data, and have
> hadoop do the scheduling and make it run in multiple nodes as a distributed
> system.
> 
> The process I need to do does NOT do map and reduce type of process, so what
> I was thinking was either feed the C program to Hadoop or write a java
> program that would call the C program and have Hadoop do its magic.
> 
> Thanks
> 
> Michael
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855341.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: calling C programs from Hadoop

Posted by Michael Robinson <ha...@gmail.com>.
Thanks for your answers.

I have read "hadoop streaming" and I think it is great, however what I am
trying to do is to run a C program that I have with its own data, and have
hadoop do the scheduling and make it run in multiple nodes as a distributed
system.

The process I need to do does NOT do map and reduce type of process, so what
I was thinking was either feed the C program to Hadoop or write a java
program that would call the C program and have Hadoop do its magic.

Thanks

Michael
-- 
View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855341.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: calling C programs from Hadoop

Posted by Asif Jan <As...@unige.ch>.
Look at Hadoop streaming, may be it is helpful to you.

asif
On May 29, 2010, at 8:31 PM, Michael Robinson wrote:

>
> I am new to Hadoop. I have successfully run java programs from  
> Hadoop and I
> would like to call C programs from Hadoop.
>
> Thank you for your help
>
> Michael
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p854833.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

****************************************************************************************
Asif Jan	
Gaia Project						
SixSq Sarl / ISDC Astrophysics Data Centre & Geneva Observatory		
Chemin  des Ecogia 16									
CH-1290 Versoix
Switzerland
	
E-mail	: asif.jan@unige.ch
Tel.		: +41 22 37 92198
Fax		: +41 22 37 92133
****************************************************************************************