You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Daniel Yehdego <dt...@miners.utep.edu> on 2011/08/02 00:13:03 UTC

RE: Hadoop-streaming using binary executable c program

Hi Bobby, 

I have written a small Perl script which do the following job:

Assume we have an output from the mapper

MAP1
<RNA-1>
<STRUCTURE-1>

MAP2
<RNA-2>
<STRUCTURE-2>

MAP3
<RNA-3>
<STRUCTURE-3>

and what the script does is reduce in the following manner : 
<RNA-1><RNA-2><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n
 and the script looks like this:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

my @handles = map { open my $h, '<', $_; $h } @ARGV;

while (@handles){
    @handles = grep { ! eof $_ } @handles;
    my @lines = map { my $v = <$_>; chomp $v; $v } @handles;
    print join(' ', @lines), "\n";
}

close $_ for @handles;

This should work for any inputs from the  mapper. But after I use hadoop streaming and put the above code as my reducer, the job was successful
but the output files were empty. And I couldn't find out.

 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar 
-mapper ./hadoopPknotsRG 
-file /data/yehdego/hadoop-0.20.2/pknotsRG 
-file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG 
-reducer ./reducer.pl 
-file /data/yehdego/hadoop-0.20.2/reducer.pl  
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt 
-output /user/yehdego/RFR2-out - verbose

Any help or suggestion is really appreciated....I am just stuck here for the weekend.
 
Regards, 

Daniel T. Yehdego
Computational Science Program 
University of Texas at El Paso, UTEP 
dtyehdego@miners.utep.edu

> From: evans@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Thu, 28 Jul 2011 07:12:11 -0700
> Subject: Re: Hadoop-streaming using binary executable c program
> 
> I am not completely sure what you are getting at.  It looks like the output of your c program is (And this is just a guess)  NOTE: \t stands for the tab character and in streaming it is used to separate the key from the value \n stands for carriage return and is used to separate individual records..
> <RNA-1>\t<STRUCTURE-1>\n
> <RNA-2>\t<STRUCTURE-2>\n
> <RNA-3>\t<STRUCTURE-3>\n
> ...
> 
> 
> And you want the output to look like
> <RNA-1><RNA-2><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n
> 
> You could use a reduce to do this, but the issue here is with the shuffle in between the maps and the reduces.  The Shuffle will group by the key to send to the reducers and then sort by the key.  So in reality your map output looks something like
> 
> FROM MAP 1:
> <RNA-1>\t<STRUCTURE-1>\n
> <RNA-2>\t<STRUCTURE-2>\n
> 
> FROM MAP 2:
> <RNA-3>\t<STRUCTURE-3>\n
> <RNA-4>\t<STRUCTURE-4>\n
> 
> FROM MAP 3:
> <RNA-5>\t<STRUCTURE-5>\n
> <RNA-6>\t<STRUCTURE-6>\n
> 
> If you send it to a single reducer (The only way to get a single file) Then the input to the reducer will be sorted alphabetically by the RNA, and the order of the input will be lost.  You can work around this by giving each line a unique number that is in the order you want It to be output.  But doing this would require you to write some code.  I would suggest that you do it with a small shell script after all the maps have completed to splice them together.
> 
> --
> Bobby
> 
> On 7/27/11 2:55 PM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
> 
> 
> 
> Hi Bobby,
> 
> I just want to ask you if there is away of using a reducer or something like concatenation to glue my outputs from the mapper and outputs
> them as a single file and segment of the predicted RNA 2D structure?
> 
> FYI: I have used a reducer NONE before:
> 
> HADOOP_HOME$ bin/hadoop jar
> /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
> ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
> /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
> /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
> /user/yehdego/RF-out -reducer NONE -verbose
> 
> and a sample of my output using the mapper of two different slave nodes looks like this :
> 
> AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAACCCCAAAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC    and
> [[[[[..................((((.(((((((...............))))))).))))............{{{{....]]]]].....}}}}....  (-13.46)
> 
> GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUUUUUCU
> ((((.(((((....((.((((((.......))))))))...))))).)))).  (-11.00)
> 
> and I want to concatenate and output them as a single predicated RNA sequence structure:
> 
> AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAACCCCAAAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUUUUUCU
> 
> [[[[[..................((((.(((((((...............))))))).))))............{{{{....]]]]].....}}}}....((((.(((((....((.((((((.......))))))))...))))).)))).
> 
> 
> Regards,
> 
> Daniel T. Yehdego
> Computational Science Program
> University of Texas at El Paso, UTEP
> dtyehdego@miners.utep.edu
> 
> > From: dtyehdego@miners.utep.edu
> > To: common-user@hadoop.apache.org
> > Subject: RE: Hadoop-streaming using binary executable c program
> > Date: Tue, 26 Jul 2011 16:23:10 +0000
> >
> >
> > Good afternoon Bobby,
> >
> > Thanks so much, now its working excellent. And the speed is also reasonable. Once again thanks u.
> >
> > Regards,
> >
> > Daniel T. Yehdego
> > Computational Science Program
> > University of Texas at El Paso, UTEP
> > dtyehdego@miners.utep.edu
> >
> > > From: evans@yahoo-inc.com
> > > To: common-user@hadoop.apache.org
> > > Date: Mon, 25 Jul 2011 14:47:34 -0700
> > > Subject: Re: Hadoop-streaming using binary executable c program
> > >
> > > This is likely to be slow and it is not ideal.  The ideal would be to modify pknotsRG to be able to read from stdin, but that may not be possible.
> > >
> > > The shell script would probably look something like the following
> > >
> > > #!/bin/sh
> > > rm -f temp.txt;
> > > while read line
> > > do
> > >   echo $line >> temp.txt;
> > > done
> > > exec pknotsRG temp.txt;
> > >
> > > Place it in a file say hadoopPknotsRG  Then you probably want to run
> > >
> > > chmod +x hadoopPknotsRG
> > >
> > > After that you want to test it with
> > >
> > > hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | ./hadoopPknotsRG
> > >
> > > If that works then you can try it with Hadoop streaming
> > >
> > > HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose
> > >
> > > --Bobby
> > >
> > > On 7/25/11 3:37 PM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
> > >
> > >
> > >
> > > Good afternoon Bobby,
> > >
> > > Thanks, you gave me a great help in finding out what the problem was. After I put the command line you suggested me, I found out that there was a segmentation error.
> > > The binary executable program pknotsRG only reads a file with a sequence in it. This means, there should be a shell script, as you have said, that will take the data coming
> > > from stdin and write it to a temporary file. Any idea on how to do this job in shell script. The thing is I am from a biology background and don't have much experience in CS.
> > > looking forward to hear from you. Thanks so much.
> > >
> > > Regards,
> > >
> > > Daniel T. Yehdego
> > > Computational Science Program
> > > University of Texas at El Paso, UTEP
> > > dtyehdego@miners.utep.edu
> > >
> > > > From: evans@yahoo-inc.com
> > > > To: common-user@hadoop.apache.org
> > > > Date: Fri, 22 Jul 2011 12:39:08 -0700
> > > > Subject: Re: Hadoop-streaming using binary executable c program
> > > >
> > > > I would suggest that you do the following to help you debug.
> > > >
> > > > hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
> > > >
> > > > This is simulating what hadoop streaming is doing.  Here we are taking the first 2 lines out of the input file and feeding them to the stdin of pknotsRG.  The first step is to make sure that you can get your program to run correctly with something like this.  You may need to change the command line to pknotsRG to get it to read the data it is processing from stdin, instead of from a file.  Alternatively you may need to write a shell script that will take the data coming from stdin.  Write it to a file and then call pknotsRG on that temporary file.  Once you have this working then you should try it again with streaming.
> > > >
> > > > --Bobby Evans
> > > >
> > > > On 7/22/11 12:31 PM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
> > > >
> > > >
> > > >
> > > > Hi Bobby, Thanks for the response.
> > > >
> > > > After I tried the following comannd:
> > > >
> > > > bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose
> > > >
> > > > I got a stderr logs :
> > > >
> > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
> > > >         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > >         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > >         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >
> > > >
> > > >
> > > > syslog logs
> > > >
> > > > 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
> > > > 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
> > > > 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_000000_0/work/./pknotsRG]
> > > > 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> > > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
> > > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
> > > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
> > > > 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
> > > >         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > >         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > >         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > > 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Daniel T. Yehdego
> > > > Computational Science Program
> > > > University of Texas at El Paso, UTEP
> > > > dtyehdego@miners.utep.edu
> > > >
> > > > > From: evans@yahoo-inc.com
> > > > > To: common-user@hadoop.apache.org; dtyehdego@miners.utep.edu
> > > > > Date: Fri, 22 Jul 2011 09:12:18 -0700
> > > > > Subject: Re: Hadoop-streaming using binary executable c program
> > > > >
> > > > > It looks like it tried to run your program and the program exited with a 1 not a 0.  What are the stderr logs like for the mappers that were launched, you should be able to access them through the Web GUI?  You might want to add in some stderr log messages to you c program too. To be able to debug how far along it is going before exiting.
> > > > >
> > > > > --Bobby Evans
> > > > >
> > > > > On 7/22/11 9:19 AM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
> > > > >
> > > > > I am trying to parallelize some very long RNA sequence for the sake of
> > > > > predicting their RNA 2D structures. I am using a binary executable c
> > > > > program called pknotsRG as my mapper. I tried the following bin/hadoop
> > > > > command:
> > > > >
> > > > > HADOOP_HOME$ bin/hadoop
> > > > > jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> > > > > -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> > > > > -file /data/yehdego/hadoop-0.20.2/pknotsRG
> > > > > -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> > > > > -output /user/yehdego/RF-out -reducer NONE -verbose
> > > > >
> > > > > but i keep getting the following error message:
> > > > >
> > > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > > > > failed with code 1
> > > > >         at
> > > > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > > >         at
> > > > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > > >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > > >         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > > >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > > >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > > >
> > > > > FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
> > > > > is a chunk of RNA sequences and the mapper is expected to get the input
> > > > > and execute the input file line by line and out put the predicted
> > > > > structure for each line of sequence for a specified number of maps. Any
> > > > > help on this problem is really appreciated. Thanks.
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> 
>

Re: Hadoop-streaming using binary executable c program

Posted by Robert Evans <ev...@yahoo-inc.com>.

What I usually do to debug streaming is to print things to STDERR.  STDERR shows up in the logs for the attempt and you should be able to see better what is happening.  I am not an expert on perl so I am not sure if you have to pass in something special to get your perl script to read form STDIN.  I see  you opening handles to all of the files on the command line, but I am not sure how that works with stdin, becaue whatever you run through streaming has to read from stdin and write to stdout.

cat map1.txt map2.txt map3.txt | ./reducer.pl

--Bobby

On 8/1/11 5:13 PM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:



Hi Bobby,

I have written a small Perl script which do the following job:

Assume we have an output from the mapper

MAP1
<RNA-1>
<STRUCTURE-1>

MAP2
<RNA-2>
<STRUCTURE-2>

MAP3
<RNA-3>
<STRUCTURE-3>

and what the script does is reduce in the following manner :
<RNA-1><RNA-2><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n
 and the script looks like this:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;

my @handles = map { open my $h, '<', $_; $h } @ARGV;

while (@handles){
    @handles = grep { ! eof $_ } @handles;
    my @lines = map { my $v = <$_>; chomp $v; $v } @handles;
    print join(' ', @lines), "\n";
}

close $_ for @handles;

This should work for any inputs from the  mapper. But after I use hadoop streaming and put the above code as my reducer, the job was successful
but the output files were empty. And I couldn't find out.

 bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
-mapper ./hadoopPknotsRG
-file /data/yehdego/hadoop-0.20.2/pknotsRG
-file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG
-reducer ./reducer.pl
-file /data/yehdego/hadoop-0.20.2/reducer.pl
-input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
-output /user/yehdego/RFR2-out - verbose

Any help or suggestion is really appreciated....I am just stuck here for the weekend.

Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehdego@miners.utep.edu

> From: evans@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Thu, 28 Jul 2011 07:12:11 -0700
> Subject: Re: Hadoop-streaming using binary executable c program
>
> I am not completely sure what you are getting at.  It looks like the output of your c program is (And this is just a guess)  NOTE: \t stands for the tab character and in streaming it is used to separate the key from the value \n stands for carriage return and is used to separate individual records..
> <RNA-1>\t<STRUCTURE-1>\n
> <RNA-2>\t<STRUCTURE-2>\n
> <RNA-3>\t<STRUCTURE-3>\n
> ...
>
>
> And you want the output to look like
> <RNA-1><RNA-2><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n
>
> You could use a reduce to do this, but the issue here is with the shuffle in between the maps and the reduces.  The Shuffle will group by the key to send to the reducers and then sort by the key.  So in reality your map output looks something like
>
> FROM MAP 1:
> <RNA-1>\t<STRUCTURE-1>\n
> <RNA-2>\t<STRUCTURE-2>\n
>
> FROM MAP 2:
> <RNA-3>\t<STRUCTURE-3>\n
> <RNA-4>\t<STRUCTURE-4>\n
>
> FROM MAP 3:
> <RNA-5>\t<STRUCTURE-5>\n
> <RNA-6>\t<STRUCTURE-6>\n
>
> If you send it to a single reducer (The only way to get a single file) Then the input to the reducer will be sorted alphabetically by the RNA, and the order of the input will be lost.  You can work around this by giving each line a unique number that is in the order you want It to be output.  But doing this would require you to write some code.  I would suggest that you do it with a small shell script after all the maps have completed to splice them together.
>
> --
> Bobby
>
> On 7/27/11 2:55 PM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
>
>
>
> Hi Bobby,
>
> I just want to ask you if there is away of using a reducer or something like concatenation to glue my outputs from the mapper and outputs
> them as a single file and segment of the predicted RNA 2D structure?
>
> FYI: I have used a reducer NONE before:
>
> HADOOP_HOME$ bin/hadoop jar
> /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
> ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
> /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
> /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
> /user/yehdego/RF-out -reducer NONE -verbose
>
> and a sample of my output using the mapper of two different slave nodes looks like this :
>
> AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAACCCCAAAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC    and
> [[[[[..................((((.(((((((...............))))))).))))............{{{{....]]]]].....}}}}....  (-13.46)
>
> GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUUUUUCU
> ((((.(((((....((.((((((.......))))))))...))))).)))).  (-11.00)
>
> and I want to concatenate and output them as a single predicated RNA sequence structure:
>
> AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAACCCCAAAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUUUUUCU
>
> [[[[[..................((((.(((((((...............))))))).))))............{{{{....]]]]].....}}}}....((((.(((((....((.((((((.......))))))))...))))).)))).
>
>
> Regards,
>
> Daniel T. Yehdego
> Computational Science Program
> University of Texas at El Paso, UTEP
> dtyehdego@miners.utep.edu
>
> > From: dtyehdego@miners.utep.edu
> > To: common-user@hadoop.apache.org
> > Subject: RE: Hadoop-streaming using binary executable c program
> > Date: Tue, 26 Jul 2011 16:23:10 +0000
> >
> >
> > Good afternoon Bobby,
> >
> > Thanks so much, now its working excellent. And the speed is also reasonable. Once again thanks u.
> >
> > Regards,
> >
> > Daniel T. Yehdego
> > Computational Science Program
> > University of Texas at El Paso, UTEP
> > dtyehdego@miners.utep.edu
> >
> > > From: evans@yahoo-inc.com
> > > To: common-user@hadoop.apache.org
> > > Date: Mon, 25 Jul 2011 14:47:34 -0700
> > > Subject: Re: Hadoop-streaming using binary executable c program
> > >
> > > This is likely to be slow and it is not ideal.  The ideal would be to modify pknotsRG to be able to read from stdin, but that may not be possible.
> > >
> > > The shell script would probably look something like the following
> > >
> > > #!/bin/sh
> > > rm -f temp.txt;
> > > while read line
> > > do
> > >   echo $line >> temp.txt;
> > > done
> > > exec pknotsRG temp.txt;
> > >
> > > Place it in a file say hadoopPknotsRG  Then you probably want to run
> > >
> > > chmod +x hadoopPknotsRG
> > >
> > > After that you want to test it with
> > >
> > > hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | ./hadoopPknotsRG
> > >
> > > If that works then you can try it with Hadoop streaming
> > >
> > > HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose
> > >
> > > --Bobby
> > >
> > > On 7/25/11 3:37 PM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
> > >
> > >
> > >
> > > Good afternoon Bobby,
> > >
> > > Thanks, you gave me a great help in finding out what the problem was. After I put the command line you suggested me, I found out that there was a segmentation error.
> > > The binary executable program pknotsRG only reads a file with a sequence in it. This means, there should be a shell script, as you have said, that will take the data coming
> > > from stdin and write it to a temporary file. Any idea on how to do this job in shell script. The thing is I am from a biology background and don't have much experience in CS.
> > > looking forward to hear from you. Thanks so much.
> > >
> > > Regards,
> > >
> > > Daniel T. Yehdego
> > > Computational Science Program
> > > University of Texas at El Paso, UTEP
> > > dtyehdego@miners.utep.edu
> > >
> > > > From: evans@yahoo-inc.com
> > > > To: common-user@hadoop.apache.org
> > > > Date: Fri, 22 Jul 2011 12:39:08 -0700
> > > > Subject: Re: Hadoop-streaming using binary executable c program
> > > >
> > > > I would suggest that you do the following to help you debug.
> > > >
> > > > hadoop fs -cat /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -
> > > >
> > > > This is simulating what hadoop streaming is doing.  Here we are taking the first 2 lines out of the input file and feeding them to the stdin of pknotsRG.  The first step is to make sure that you can get your program to run correctly with something like this.  You may need to change the command line to pknotsRG to get it to read the data it is processing from stdin, instead of from a file.  Alternatively you may need to write a shell script that will take the data coming from stdin.  Write it to a file and then call pknotsRG on that temporary file.  Once you have this working then you should try it again with streaming.
> > > >
> > > > --Bobby Evans
> > > >
> > > > On 7/22/11 12:31 PM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
> > > >
> > > >
> > > >
> > > > Hi Bobby, Thanks for the response.
> > > >
> > > > After I tried the following comannd:
> > > >
> > > > bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -mapper /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -  -file /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG  -reducer NONE -input /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out - verbose
> > > >
> > > > I got a stderr logs :
> > > >
> > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
> > > >         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > >         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > >         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >
> > > >
> > > >
> > > > syslog logs
> > > >
> > > > 2011-07-22 13:02:27,467 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
> > > > 2011-07-22 13:02:27,913 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
> > > > 2011-07-22 13:02:28,149 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/data/yehdego/hadoop_tmp/dfs/local/taskTracker/jobcache/job_201107181535_0079/attempt_201107181535_0079_m_000000_0/work/./pknotsRG]
> > > > 2011-07-22 13:02:28,242 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> > > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
> > > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
> > > > 2011-07-22 13:02:28,267 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
> > > > 2011-07-22 13:02:28,361 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 139
> > > >         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > >         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > >         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > > 2011-07-22 13:02:28,395 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Daniel T. Yehdego
> > > > Computational Science Program
> > > > University of Texas at El Paso, UTEP
> > > > dtyehdego@miners.utep.edu
> > > >
> > > > > From: evans@yahoo-inc.com
> > > > > To: common-user@hadoop.apache.org; dtyehdego@miners.utep.edu
> > > > > Date: Fri, 22 Jul 2011 09:12:18 -0700
> > > > > Subject: Re: Hadoop-streaming using binary executable c program
> > > > >
> > > > > It looks like it tried to run your program and the program exited with a 1 not a 0.  What are the stderr logs like for the mappers that were launched, you should be able to access them through the Web GUI?  You might want to add in some stderr log messages to you c program too. To be able to debug how far along it is going before exiting.
> > > > >
> > > > > --Bobby Evans
> > > > >
> > > > > On 7/22/11 9:19 AM, "Daniel Yehdego" <dt...@miners.utep.edu> wrote:
> > > > >
> > > > > I am trying to parallelize some very long RNA sequence for the sake of
> > > > > predicting their RNA 2D structures. I am using a binary executable c
> > > > > program called pknotsRG as my mapper. I tried the following bin/hadoop
> > > > > command:
> > > > >
> > > > > HADOOP_HOME$ bin/hadoop
> > > > > jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar
> > > > > -mapper /data/yehdego/hadoop-0.20.2/pknotsRG
> > > > > -file /data/yehdego/hadoop-0.20.2/pknotsRG
> > > > > -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt
> > > > > -output /user/yehdego/RF-out -reducer NONE -verbose
> > > > >
> > > > > but i keep getting the following error message:
> > > > >
> > > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> > > > > failed with code 1
> > > > >         at
> > > > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> > > > >         at
> > > > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> > > > >         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> > > > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> > > > >         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> > > > >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > > > >         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > > >
> > > > > FYI: my input file is RF00028_B.bpseqL3G5_seg_Centered_Method.txt which
> > > > > is a chunk of RNA sequences and the mapper is expected to get the input
> > > > > and execute the input file line by line and out put the predicted
> > > > > structure for each line of sequence for a specified number of maps. Any
> > > > > help on this problem is really appreciated. Thanks.
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>
>