You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Romeo Kienzler <ro...@ormium.de> on 2011/12/06 09:59:05 UTC

Question on Hadoop Streaming

Hi,

I've got the following setup for NGS read alignment:


A script accepting data from stdin/out:
------------------------------------------------------------
cat /root/bowtiestreaming.sh
cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
/home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 
- 2> /root/bowtie.log



A file copied to HDFS:
------------------------------------------------------------
hadoop fs -put 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000

A streaming job invoked with only the mapper:
------------------------------------------------------------
hadoop jar 
hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar 
-input 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 
-output 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned 
-mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0

The file cannot be found even it is displayed:
------------------------------------------------------------
hadoop fs -cat 
/user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
11/12/06 09:07:47 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
cacheTimeout=300000
11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
cat: File does not exist: 
/user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned


He file looks like this (tab seperated):
head 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 

@SRR014475.1 :1:1:108:111 length=36     
GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA    I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
@SRR014475.2 :1:1:112:26 length=36      
GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC    I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
@SRR014475.3 :1:1:101:937 length=36     
GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
@SRR014475.4 :1:1:124:64 length=36      
GAACACATAGAACAACAGGATTCGCCAGAACACCTG    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
@SRR014475.5 :1:1:108:897 length=36     
GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT    I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
@SRR014475.6 :1:1:106:14 length=36      
GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT    I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
@SRR014475.7 :1:1:118:934 length=36     
GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT    III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
@SRR014475.8 :1:1:123:8 length=36       
GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN    I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
@SRR014475.9 :1:1:118:88 length=36      
GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC    IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
@SRR014475.10 :1:1:92:122 length=36     
ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA    IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;


and the result like this:

cat 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 
|./bowtiestreaming.sh |head
@SRR014475.3 :1:1:101:937 length=36     +       
gi|110640213|ref|NC_008253.1|   3393863 
GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA    
IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G  0       7:T>C,27:G>T
@SRR014475.4 :1:1:124:64 length=36      +       
gi|110640213|ref|NC_008253.1|   2288633 
GAACACATAGAACAACAGGATTCGCCAGAACACCTG    
IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;  0       30:T>C
@SRR014475.5 :1:1:108:897 length=36     +       
gi|110640213|ref|NC_008253.1|   4389356 
GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT    
I0I:I'IIII+IG3II46II0>C@=III()+:+2&$  0       
5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
@SRR014475.9 :1:1:118:88 length=36      -       
gi|110640213|ref|NC_008253.1|   3598410 
GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC    
%$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII  0
@SRR014475.15 :1:1:87:967 length=36     +       
gi|110640213|ref|NC_008253.1|   4474247 
GACTACACGATCGCCTGCCTTAATATTCTTTACACC    
IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II'  0       6:G>A,26:G>T
@SRR014475.20 :1:1:108:121 length=36    -       
gi|110640213|ref|NC_008253.1|   37761   
AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC    
I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII  0       12:C>T
@SRR014475.23 :1:1:75:54 length=36      +       
gi|110640213|ref|NC_008253.1|   2465453 
GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA    
IIIIIIIIIIIICII<III;';29=9I.4%EE2)*'  0
@SRR014475.24 :1:1:89:904 length=36     -       
gi|110640213|ref|NC_008253.1|   3216193 
ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC    
#%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII  0       
18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
@SRR014475.27 :1:1:74:887 length=36     -       
gi|110640213|ref|NC_008253.1|   540567  
AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC    
*&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII  0       34:C>A,35:C>A
@SRR014475.30 :1:1:123:73 length=36     +       
gi|110640213|ref|NC_008253.1|   3391697 
AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT    
IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;;  0       30:C>T,34:G>T


Any ideas?

best Regards,

Romeo


-------------
Romeo Kienzler
r o m e o @ o r m i u m . d e


Re: Question on Hadoop Streaming

Posted by Romeo Kienzler <ro...@ormium.de>.
Hi Brock,

I'm not getting any errors.

I'm issuing the following command now:

hadoop jar 
hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar 
-input 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 
-output 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned 
-mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0 -file 
bowtiestreaming.sh

The only error I get using "cat hadoop-0.21.0/logs/* |grep Exception" is:
org.apache.hadoop.fs.ChecksumException: Checksum error: 
file:/root/hadoop-0.21.0/logs/history/job_201112060917_0002_root at 2620416
2011-12-06 11:14:34,515 WARN 
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell 
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13816: No 
such process
2011-12-06 11:14:43,039 WARN 
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell 
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13862: No 
such process
2011-12-06 11:14:46,282 WARN 
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell 
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13891: No 
such process
2011-12-06 11:14:49,841 WARN 
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell 
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13978: No 
such process


best Regards,

Romeo

On 12/06/2011 10:49 AM, Brock Noland wrote:
> Does you job end with an error?
>
> I am guessing what you want is:
>
> -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'
>
> First option says use your script as a mapper and second says ship
> your script as part of the job.
>
> Brock
>
> On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler<ro...@ormium.de>  wrote:
>> Hi,
>>
>> I've got the following setup for NGS read alignment:
>>
>>
>> A script accepting data from stdin/out:
>> ------------------------------------------------------------
>> cat /root/bowtiestreaming.sh
>> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
>> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
>> 2>  /root/bowtie.log
>>
>>
>>
>> A file copied to HDFS:
>> ------------------------------------------------------------
>> hadoop fs -put
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>>
>> A streaming job invoked with only the mapper:
>> ------------------------------------------------------------
>> hadoop jar
>> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> -output
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>>
>> The file cannot be found even it is displayed:
>> ------------------------------------------------------------
>> hadoop fs -cat
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> 11/12/06 09:07:47 INFO security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
>> cacheTimeout=300000
>> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
>> Instead, use mapreduce.task.attempt.id
>> cat: File does not exist:
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>>
>>
>> He file looks like this (tab seperated):
>> head
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> @SRR014475.1 :1:1:108:111 length=36     GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
>>     I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
>> @SRR014475.2 :1:1:112:26 length=36      GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
>>     I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
>> @SRR014475.3 :1:1:101:937 length=36     GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>>     IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
>> @SRR014475.4 :1:1:124:64 length=36      GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>>     IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
>> @SRR014475.5 :1:1:108:897 length=36     GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>>     I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
>> @SRR014475.6 :1:1:106:14 length=36      GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
>>     I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
>> @SRR014475.7 :1:1:118:934 length=36     GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
>>     III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
>> @SRR014475.8 :1:1:123:8 length=36       GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
>>     I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
>> @SRR014475.9 :1:1:118:88 length=36      GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
>>     IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
>> @SRR014475.10 :1:1:92:122 length=36     ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
>>     IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>>
>>
>> and the result like this:
>>
>> cat
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> |./bowtiestreaming.sh |head
>> @SRR014475.3 :1:1:101:937 length=36     +
>> gi|110640213|ref|NC_008253.1|   3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>>     IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G  0       7:T>C,27:G>T
>> @SRR014475.4 :1:1:124:64 length=36      +
>> gi|110640213|ref|NC_008253.1|   2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>>     IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;  0       30:T>C
>> @SRR014475.5 :1:1:108:897 length=36     +
>> gi|110640213|ref|NC_008253.1|   4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>>     I0I:I'IIII+IG3II46II0>C@=III()+:+2&$  0
>> 5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
>> @SRR014475.9 :1:1:118:88 length=36      -
>> gi|110640213|ref|NC_008253.1|   3598410 GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC
>>     %$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII  0
>> @SRR014475.15 :1:1:87:967 length=36     +
>> gi|110640213|ref|NC_008253.1|   4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC
>>     IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II'  0       6:G>A,26:G>T
>> @SRR014475.20 :1:1:108:121 length=36    -
>> gi|110640213|ref|NC_008253.1|   37761   AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC
>>     I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII  0       12:C>T
>> @SRR014475.23 :1:1:75:54 length=36      +
>> gi|110640213|ref|NC_008253.1|   2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
>>     IIIIIIIIIIIICII<III;';29=9I.4%EE2)*'  0
>> @SRR014475.24 :1:1:89:904 length=36     -
>> gi|110640213|ref|NC_008253.1|   3216193 ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC
>>     #%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII  0
>> 18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
>> @SRR014475.27 :1:1:74:887 length=36     -
>> gi|110640213|ref|NC_008253.1|   540567  AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC
>>     *&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII  0       34:C>A,35:C>A
>> @SRR014475.30 :1:1:123:73 length=36     +
>> gi|110640213|ref|NC_008253.1|   3391697 AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT
>>     IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;;  0       30:C>T,34:G>T
>>
>>
>> Any ideas?
>>
>> best Regards,
>>
>> Romeo
>>
>>
>> -------------
>> Romeo Kienzler
>> r o m e o @ o r m i u m . d e
>>


Re: Question on Hadoop Streaming

Posted by Romeo Kienzler <ro...@ormium.de>.
Hi,

the following command works:

hadoop jar 
hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar 
-input input -output output2 -mapper /root/bowtiestreaming.sh -reducer NONE

Best Regards,

Romeo

On 12/06/2011 10:49 AM, Brock Noland wrote:
> Does you job end with an error?
>
> I am guessing what you want is:
>
> -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'
>
> First option says use your script as a mapper and second says ship
> your script as part of the job.
>
> Brock
>
> On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler<ro...@ormium.de>  wrote:
>> Hi,
>>
>> I've got the following setup for NGS read alignment:
>>
>>
>> A script accepting data from stdin/out:
>> ------------------------------------------------------------
>> cat /root/bowtiestreaming.sh
>> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
>> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
>> 2>  /root/bowtie.log
>>
>>
>>
>> A file copied to HDFS:
>> ------------------------------------------------------------
>> hadoop fs -put
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>>
>> A streaming job invoked with only the mapper:
>> ------------------------------------------------------------
>> hadoop jar
>> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> -output
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>>
>> The file cannot be found even it is displayed:
>> ------------------------------------------------------------
>> hadoop fs -cat
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> 11/12/06 09:07:47 INFO security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
>> cacheTimeout=300000
>> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
>> Instead, use mapreduce.task.attempt.id
>> cat: File does not exist:
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>>
>>
>> He file looks like this (tab seperated):
>> head
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> @SRR014475.1 :1:1:108:111 length=36     GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
>>     I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
>> @SRR014475.2 :1:1:112:26 length=36      GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
>>     I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
>> @SRR014475.3 :1:1:101:937 length=36     GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>>     IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
>> @SRR014475.4 :1:1:124:64 length=36      GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>>     IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
>> @SRR014475.5 :1:1:108:897 length=36     GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>>     I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
>> @SRR014475.6 :1:1:106:14 length=36      GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
>>     I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
>> @SRR014475.7 :1:1:118:934 length=36     GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
>>     III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
>> @SRR014475.8 :1:1:123:8 length=36       GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
>>     I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
>> @SRR014475.9 :1:1:118:88 length=36      GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
>>     IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
>> @SRR014475.10 :1:1:92:122 length=36     ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
>>     IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>>
>>
>> and the result like this:
>>
>> cat
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> |./bowtiestreaming.sh |head
>> @SRR014475.3 :1:1:101:937 length=36     +
>> gi|110640213|ref|NC_008253.1|   3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>>     IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G  0       7:T>C,27:G>T
>> @SRR014475.4 :1:1:124:64 length=36      +
>> gi|110640213|ref|NC_008253.1|   2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>>     IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;  0       30:T>C
>> @SRR014475.5 :1:1:108:897 length=36     +
>> gi|110640213|ref|NC_008253.1|   4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>>     I0I:I'IIII+IG3II46II0>C@=III()+:+2&$  0
>> 5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
>> @SRR014475.9 :1:1:118:88 length=36      -
>> gi|110640213|ref|NC_008253.1|   3598410 GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC
>>     %$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII  0
>> @SRR014475.15 :1:1:87:967 length=36     +
>> gi|110640213|ref|NC_008253.1|   4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC
>>     IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II'  0       6:G>A,26:G>T
>> @SRR014475.20 :1:1:108:121 length=36    -
>> gi|110640213|ref|NC_008253.1|   37761   AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC
>>     I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII  0       12:C>T
>> @SRR014475.23 :1:1:75:54 length=36      +
>> gi|110640213|ref|NC_008253.1|   2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
>>     IIIIIIIIIIIICII<III;';29=9I.4%EE2)*'  0
>> @SRR014475.24 :1:1:89:904 length=36     -
>> gi|110640213|ref|NC_008253.1|   3216193 ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC
>>     #%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII  0
>> 18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
>> @SRR014475.27 :1:1:74:887 length=36     -
>> gi|110640213|ref|NC_008253.1|   540567  AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC
>>     *&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII  0       34:C>A,35:C>A
>> @SRR014475.30 :1:1:123:73 length=36     +
>> gi|110640213|ref|NC_008253.1|   3391697 AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT
>>     IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;;  0       30:C>T,34:G>T
>>
>>
>> Any ideas?
>>
>> best Regards,
>>
>> Romeo
>>
>>
>> -------------
>> Romeo Kienzler
>> r o m e o @ o r m i u m . d e
>>


Re: Question on Hadoop Streaming

Posted by Brock Noland <br...@cloudera.com>.
Does you job end with an error?

I am guessing what you want is:

-mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'

First option says use your script as a mapper and second says ship
your script as part of the job.

Brock

On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler <ro...@ormium.de> wrote:
> Hi,
>
> I've got the following setup for NGS read alignment:
>
>
> A script accepting data from stdin/out:
> ------------------------------------------------------------
> cat /root/bowtiestreaming.sh
> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
> 2> /root/bowtie.log
>
>
>
> A file copied to HDFS:
> ------------------------------------------------------------
> hadoop fs -put
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>
> A streaming job invoked with only the mapper:
> ------------------------------------------------------------
> hadoop jar
> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> -output
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>
> The file cannot be found even it is displayed:
> ------------------------------------------------------------
> hadoop fs -cat
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> 11/12/06 09:07:47 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> cat: File does not exist:
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>
>
> He file looks like this (tab seperated):
> head
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> @SRR014475.1 :1:1:108:111 length=36     GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
>    I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
> @SRR014475.2 :1:1:112:26 length=36      GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
>    I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
> @SRR014475.3 :1:1:101:937 length=36     GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
> @SRR014475.4 :1:1:124:64 length=36      GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
> @SRR014475.5 :1:1:108:897 length=36     GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>    I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
> @SRR014475.6 :1:1:106:14 length=36      GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
>    I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
> @SRR014475.7 :1:1:118:934 length=36     GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
>    III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
> @SRR014475.8 :1:1:123:8 length=36       GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
>    I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
> @SRR014475.9 :1:1:118:88 length=36      GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
>    IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
> @SRR014475.10 :1:1:92:122 length=36     ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
>    IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>
>
> and the result like this:
>
> cat
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> |./bowtiestreaming.sh |head
> @SRR014475.3 :1:1:101:937 length=36     +
> gi|110640213|ref|NC_008253.1|   3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G  0       7:T>C,27:G>T
> @SRR014475.4 :1:1:124:64 length=36      +
> gi|110640213|ref|NC_008253.1|   2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;  0       30:T>C
> @SRR014475.5 :1:1:108:897 length=36     +
> gi|110640213|ref|NC_008253.1|   4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>    I0I:I'IIII+IG3II46II0>C@=III()+:+2&$  0
> 5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
> @SRR014475.9 :1:1:118:88 length=36      -
> gi|110640213|ref|NC_008253.1|   3598410 GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC
>    %$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII  0
> @SRR014475.15 :1:1:87:967 length=36     +
> gi|110640213|ref|NC_008253.1|   4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC
>    IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II'  0       6:G>A,26:G>T
> @SRR014475.20 :1:1:108:121 length=36    -
> gi|110640213|ref|NC_008253.1|   37761   AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC
>    I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII  0       12:C>T
> @SRR014475.23 :1:1:75:54 length=36      +
> gi|110640213|ref|NC_008253.1|   2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
>    IIIIIIIIIIIICII<III;';29=9I.4%EE2)*'  0
> @SRR014475.24 :1:1:89:904 length=36     -
> gi|110640213|ref|NC_008253.1|   3216193 ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC
>    #%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII  0
> 18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
> @SRR014475.27 :1:1:74:887 length=36     -
> gi|110640213|ref|NC_008253.1|   540567  AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC
>    *&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII  0       34:C>A,35:C>A
> @SRR014475.30 :1:1:123:73 length=36     +
> gi|110640213|ref|NC_008253.1|   3391697 AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT
>    IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;;  0       30:C>T,34:G>T
>
>
> Any ideas?
>
> best Regards,
>
> Romeo
>
>
> -------------
> Romeo Kienzler
> r o m e o @ o r m i u m . d e
>