You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Daniel Yehdego <dt...@miners.utep.edu> on 2011/07/22 16:41:17 UTC

Hadoop-streaming with a c binary executable as a mapper

Hi, 

I using hadoop-streaming for parallelizing a big RNA data. I am using a
c binary executable program called pknotsRG as my mapper. My command to
run the job looks like:

HADOOP_HOME$  bin/hadoop
jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
-mapper /data/yehdego/hadoop-0.20.2/pknotsRG
-file /data/yehdego/hadoop-0.20.2/pknotsRG 
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
-output /user/yehdego/RF-out 
-reducer NONE 
-verbose 

and I keep getting the following error messages:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line 
and execute and predict their 2D secondary structure. I tried the executable locally and it worked.

[yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG
RF00028_B.bpseqL3G5_seg_Centered_Method.txt 

AUGACUCUCUAAAUUGCAAAAUUUACCUUUGGAGGGAAAAGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA
....(((((((((..............)))))))))...((((((((((......))))))))))[[[[[.{{{{{{...]]]]].....}}}}}}...  
GCAAGACCGUCAAAUUGCGGGAAAAGGGU
......((((......)))).........  
CAACAGCCGUUCAGUACCAAGUCUCAGGGGA
......((.((.((........)).)).)).  
AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU
......[[[.{{{{]]]....(((((.((((.....((((..((((...))))....)).)).)))).)))))..}}}}......  
CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU
.(((.......(((((...)))))..(((((..((((.....(((((((((....)))))))))....)))))))))..))).(((((....)))))..  

RE: Hadoop-streaming with a c binary executable as a mapper

Posted by Daniel Yehdego <dt...@miners.utep.edu>.
Thanks Joey for your quick response, 

I have tried the suggestion you gave me and its still not working, after I  run:

bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose

I  got the following task logs:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)



syslog logs

2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_000000_0/work/./pknotsRG]
2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task


Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehdego@miners.utep.edu

> CC: common-user@hadoop.apache.org
> From: joey@cloudera.com
> Subject: Re: Hadoop-streaming with a c binary executable as a mapper
> Date: Fri, 22 Jul 2011 11:34:08 -0400
> To: common-user@hadoop.apache.org
> 
> Your executable needs to read lines from standard in. Try setting your mapper like this:
> 
> > -mapper "/data/yehdego/hadoop-0.20.2/pknotsRG -"
> 
> If that doesn't work, you may need to execute your C program from a shell script. The -I added to the command line says read from STDIN. 
> 
> -Joey
> 
> 
> On Jul 22, 2011, at 10:41, Daniel Yehdego <dt...@miners.utep.edu> wrote:
> 
> > Hi, 
> > 
> > I using hadoop-streaming for parallelizing a big RNA data. I am using a
> > c binary executable program called pknotsRG as my mapper. My command to
> > run the job looks like:
> > 
> > HADOOP_HOME$  bin/hadoop
> > jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> > -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> > -file /data/yehdego/hadoop-0.20.2/pknotsRG 
> > -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> > -output /user/yehdego/RF-out 
> > -reducer NONE 
> > -verbose 
> > 
> > and I keep getting the following error messages:
> > 
> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > failed with code 1
> >    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> >    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> >    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> >    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > 
> > FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line 
> > and execute and predict their 2D secondary structure. I tried the executable locally and it worked.
> > 
> > [yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG
> > RF00028_B.bpseqL3G5_seg_Centered_Method.txt 
> > 
> > AUGACUCUCUAAAUUGCAAAAUUUACCUUUGGAGGGAAAAGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA
> > ....(((((((((..............)))))))))...((((((((((......))))))))))[[[[[.{{{{{{...]]]]].....}}}}}}...  
> > GCAAGACCGUCAAAUUGCGGGAAAAGGGU
> > ......((((......)))).........  
> > CAACAGCCGUUCAGUACCAAGUCUCAGGGGA
> > ......((.((.((........)).)).)).  
> > AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU
> > ......[[[.{{{{]]]....(((((.((((.....((((..((((...))))....)).)).)))).)))))..}}}}......  
> > CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU
> > .(((.......(((((...)))))..(((((..((((.....(((((((((....)))))))))....)))))))))..))).(((((....)))))..  
 		 	   		  

Re: Hadoop-streaming with a c binary executable as a mapper

Posted by Joey Echeverria <jo...@cloudera.com>.
Your executable needs to read lines from standard in. Try setting your mapper like this:

> -mapper "/data/yehdego/hadoop-0.20.2/pknotsRG -"

If that doesn't work, you may need to execute your C program from a shell script. The -I added to the command line says read from STDIN. 

-Joey


On Jul 22, 2011, at 10:41, Daniel Yehdego <dt...@miners.utep.edu> wrote:

> Hi, 
> 
> I using hadoop-streaming for parallelizing a big RNA data. I am using a
> c binary executable program called pknotsRG as my mapper. My command to
> run the job looks like:
> 
> HADOOP_HOME$  bin/hadoop
> jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> -file /data/yehdego/hadoop-0.20.2/pknotsRG 
> -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> -output /user/yehdego/RF-out 
> -reducer NONE 
> -verbose 
> 
> and I keep getting the following error messages:
> 
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 1
>    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
>    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
>    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line 
> and execute and predict their 2D secondary structure. I tried the executable locally and it worked.
> 
> [yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG
> RF00028_B.bpseqL3G5_seg_Centered_Method.txt 
> 
> AUGACUCUCUAAAUUGCAAAAUUUACCUUUGGAGGGAAAAGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA
> ....(((((((((..............)))))))))...((((((((((......))))))))))[[[[[.{{{{{{...]]]]].....}}}}}}...  
> GCAAGACCGUCAAAUUGCGGGAAAAGGGU
> ......((((......)))).........  
> CAACAGCCGUUCAGUACCAAGUCUCAGGGGA
> ......((.((.((........)).)).)).  
> AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU
> ......[[[.{{{{]]]....(((((.((((.....((((..((((...))))....)).)).)))).)))))..}}}}......  
> CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU
> .(((.......(((((...)))))..(((((..((((.....(((((((((....)))))))))....)))))))))..))).(((((....)))))..