You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Daniel Yehdego <dt...@miners.utep.edu> on 2011/07/22 16:41:17 UTC
Hadoop-streaming with a c binary executable as a mapper
Hi,
I using hadoop-streaming for parallelizing a big RNA data. I am using a
c binary executable program called pknotsRG as my mapper. My command to
run the job looks like:
HADOOP_HOME$ bin/hadoop
jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
-mapper /data/yehdego/hadoop-0.20.2/pknotsRG
-file /data/yehdego/hadoop-0.20.2/pknotsRG
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
-output /user/yehdego/RF-out
-reducer NONE
-verbose
and I keep getting the following error messages:
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line
and execute and predict their 2D secondary structure. I tried the executable locally and it worked.
[yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG
RF00028_B.bpseqL3G5_seg_Centered_Method.txt
AUGACUCUCUAAAUUGCAAAAUUUACCUUUGGAGGGAAAAGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA
....(((((((((..............)))))))))...((((((((((......))))))))))[[[[[.{{{{{{...]]]]].....}}}}}}...
GCAAGACCGUCAAAUUGCGGGAAAAGGGU
......((((......)))).........
CAACAGCCGUUCAGUACCAAGUCUCAGGGGA
......((.((.((........)).)).)).
AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU
......[[[.{{{{]]]....(((((.((((.....((((..((((...))))....)).)).)))).)))))..}}}}......
CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU
.(((.......(((((...)))))..(((((..((((.....(((((((((....)))))))))....)))))))))..))).(((((....)))))..
RE: Hadoop-streaming with a c binary executable as a mapper
Posted by Daniel Yehdego <dt...@miners.utep.edu>.
Thanks Joey for your quick response,
I have tried the suggestion you gave me and its still not working, after I run:
bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG - -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose
I got the following task logs:
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
syslog logs
2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_000000_0/work/./pknotsRG]
2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
Regards,
Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehdego@miners.utep.edu
> CC: common-user@hadoop.apache.org
> From: joey@cloudera.com
> Subject: Re: Hadoop-streaming with a c binary executable as a mapper
> Date: Fri, 22 Jul 2011 11:34:08 -0400
> To: common-user@hadoop.apache.org
>
> Your executable needs to read lines from standard in. Try setting your mapper like this:
>
> > -mapper "/data/yehdego/hadoop-0.20.2/pknotsRG -"
>
> If that doesn't work, you may need to execute your C program from a shell script. The -I added to the command line says read from STDIN.
>
> -Joey
>
>
> On Jul 22, 2011, at 10:41, Daniel Yehdego <dt...@miners.utep.edu> wrote:
>
> > Hi,
> >
> > I using hadoop-streaming for parallelizing a big RNA data. I am using a
> > c binary executable program called pknotsRG as my mapper. My command to
> > run the job looks like:
> >
> > HADOOP_HOME$ bin/hadoop
> > jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> > -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> > -file /data/yehdego/hadoop-0.20.2/pknotsRG
> > -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> > -output /user/yehdego/RF-out
> > -reducer NONE
> > -verbose
> >
> > and I keep getting the following error messages:
> >
> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > failed with code 1
> > at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line
> > and execute and predict their 2D secondary structure. I tried the executable locally and it worked.
> >
> > [yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG
> > RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> >
> > AUGACUCUCUAAAUUGCAAAAUUUACCUUUGGAGGGAAAAGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA
> > ....(((((((((..............)))))))))...((((((((((......))))))))))[[[[[.{{{{{{...]]]]].....}}}}}}...
> > GCAAGACCGUCAAAUUGCGGGAAAAGGGU
> > ......((((......)))).........
> > CAACAGCCGUUCAGUACCAAGUCUCAGGGGA
> > ......((.((.((........)).)).)).
> > AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU
> > ......[[[.{{{{]]]....(((((.((((.....((((..((((...))))....)).)).)))).)))))..}}}}......
> > CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU
> > .(((.......(((((...)))))..(((((..((((.....(((((((((....)))))))))....)))))))))..))).(((((....)))))..
Re: Hadoop-streaming with a c binary executable as a mapper
Posted by Joey Echeverria <jo...@cloudera.com>.
Your executable needs to read lines from standard in. Try setting your mapper like this:
> -mapper "/data/yehdego/hadoop-0.20.2/pknotsRG -"
If that doesn't work, you may need to execute your C program from a shell script. The -I added to the command line says read from STDIN.
-Joey
On Jul 22, 2011, at 10:41, Daniel Yehdego <dt...@miners.utep.edu> wrote:
> Hi,
>
> I using hadoop-streaming for parallelizing a big RNA data. I am using a
> c binary executable program called pknotsRG as my mapper. My command to
> run the job looks like:
>
> HADOOP_HOME$ bin/hadoop
> jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> -file /data/yehdego/hadoop-0.20.2/pknotsRG
> -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> -output /user/yehdego/RF-out
> -reducer NONE
> -verbose
>
> and I keep getting the following error messages:
>
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 1
> at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> FYI: I am inputing a file with lines of sequences and the mapper is expected to take each line
> and execute and predict their 2D secondary structure. I tried the executable locally and it worked.
>
> [yehdego@bulgaria hadoop-0.20.2]$ ./pknotsRG
> RF00028_B.bpseqL3G5_seg_Centered_Method.txt
>
> AUGACUCUCUAAAUUGCAAAAUUUACCUUUGGAGGGAAAAGUUAUCAGGCCUGCACCUGAUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAUA
> ....(((((((((..............)))))))))...((((((((((......))))))))))[[[[[.{{{{{{...]]]]].....}}}}}}...
> GCAAGACCGUCAAAUUGCGGGAAAAGGGU
> ......((((......)))).........
> CAACAGCCGUUCAGUACCAAGUCUCAGGGGA
> ......((.((.((........)).)).)).
> AACUUUGAGAUGGCCUUGCAAAGGAUAUGGUAAUAAGCUGACGGACAGGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAUUU
> ......[[[.{{{{]]]....(((((.((((.....((((..((((...))))....)).)).)))).)))))..}}}}......
> CGGUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAAUAGGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCU
> .(((.......(((((...)))))..(((((..((((.....(((((((((....)))))))))....)))))))))..))).(((((....)))))..