You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Romeo Kienzler <ro...@ormium.de> on 2011/12/06 09:59:05 UTC
Question on Hadoop Streaming
Hi,
I've got the following setup for NGS read alignment:
A script accepting data from stdin/out:
------------------------------------------------------------
cat /root/bowtiestreaming.sh
cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
/home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12
- 2> /root/bowtie.log
A file copied to HDFS:
------------------------------------------------------------
hadoop fs -put
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
A streaming job invoked with only the mapper:
------------------------------------------------------------
hadoop jar
hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar
-input
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
-output
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
-mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
The file cannot be found even it is displayed:
------------------------------------------------------------
hadoop fs -cat
/user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
11/12/06 09:07:47 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=300000
11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
cat: File does not exist:
/user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
He file looks like this (tab seperated):
head
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
@SRR014475.1 :1:1:108:111 length=36
GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
@SRR014475.2 :1:1:112:26 length=36
GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
@SRR014475.3 :1:1:101:937 length=36
GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
@SRR014475.4 :1:1:124:64 length=36
GAACACATAGAACAACAGGATTCGCCAGAACACCTG IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
@SRR014475.5 :1:1:108:897 length=36
GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
@SRR014475.6 :1:1:106:14 length=36
GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
@SRR014475.7 :1:1:118:934 length=36
GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
@SRR014475.8 :1:1:123:8 length=36
GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
@SRR014475.9 :1:1:118:88 length=36
GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
@SRR014475.10 :1:1:92:122 length=36
ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
and the result like this:
cat
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
|./bowtiestreaming.sh |head
@SRR014475.3 :1:1:101:937 length=36 +
gi|110640213|ref|NC_008253.1| 3393863
GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G 0 7:T>C,27:G>T
@SRR014475.4 :1:1:124:64 length=36 +
gi|110640213|ref|NC_008253.1| 2288633
GAACACATAGAACAACAGGATTCGCCAGAACACCTG
IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; 0 30:T>C
@SRR014475.5 :1:1:108:897 length=36 +
gi|110640213|ref|NC_008253.1| 4389356
GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ 0
5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
@SRR014475.9 :1:1:118:88 length=36 -
gi|110640213|ref|NC_008253.1| 3598410
GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC
%$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII 0
@SRR014475.15 :1:1:87:967 length=36 +
gi|110640213|ref|NC_008253.1| 4474247
GACTACACGATCGCCTGCCTTAATATTCTTTACACC
IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II' 0 6:G>A,26:G>T
@SRR014475.20 :1:1:108:121 length=36 -
gi|110640213|ref|NC_008253.1| 37761
AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC
I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII 0 12:C>T
@SRR014475.23 :1:1:75:54 length=36 +
gi|110640213|ref|NC_008253.1| 2465453
GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
IIIIIIIIIIIICII<III;';29=9I.4%EE2)*' 0
@SRR014475.24 :1:1:89:904 length=36 -
gi|110640213|ref|NC_008253.1| 3216193
ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC
#%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII 0
18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
@SRR014475.27 :1:1:74:887 length=36 -
gi|110640213|ref|NC_008253.1| 540567
AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC
*&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII 0 34:C>A,35:C>A
@SRR014475.30 :1:1:123:73 length=36 +
gi|110640213|ref|NC_008253.1| 3391697
AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT
IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;; 0 30:C>T,34:G>T
Any ideas?
best Regards,
Romeo
-------------
Romeo Kienzler
r o m e o @ o r m i u m . d e
Re: Question on Hadoop Streaming
Posted by Romeo Kienzler <ro...@ormium.de>.
Hi Brock,
I'm not getting any errors.
I'm issuing the following command now:
hadoop jar
hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar
-input
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
-output
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
-mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0 -file
bowtiestreaming.sh
The only error I get using "cat hadoop-0.21.0/logs/* |grep Exception" is:
org.apache.hadoop.fs.ChecksumException: Checksum error:
file:/root/hadoop-0.21.0/logs/history/job_201112060917_0002_root at 2620416
2011-12-06 11:14:34,515 WARN
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13816: No
such process
2011-12-06 11:14:43,039 WARN
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13862: No
such process
2011-12-06 11:14:46,282 WARN
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13891: No
such process
2011-12-06 11:14:49,841 WARN
org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell
command org.apache.hadoop.util.Shell$ExitCodeException: kill -13978: No
such process
best Regards,
Romeo
On 12/06/2011 10:49 AM, Brock Noland wrote:
> Does you job end with an error?
>
> I am guessing what you want is:
>
> -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'
>
> First option says use your script as a mapper and second says ship
> your script as part of the job.
>
> Brock
>
> On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler<ro...@ormium.de> wrote:
>> Hi,
>>
>> I've got the following setup for NGS read alignment:
>>
>>
>> A script accepting data from stdin/out:
>> ------------------------------------------------------------
>> cat /root/bowtiestreaming.sh
>> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
>> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
>> 2> /root/bowtie.log
>>
>>
>>
>> A file copied to HDFS:
>> ------------------------------------------------------------
>> hadoop fs -put
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>>
>> A streaming job invoked with only the mapper:
>> ------------------------------------------------------------
>> hadoop jar
>> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> -output
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>>
>> The file cannot be found even it is displayed:
>> ------------------------------------------------------------
>> hadoop fs -cat
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> 11/12/06 09:07:47 INFO security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
>> cacheTimeout=300000
>> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
>> Instead, use mapreduce.task.attempt.id
>> cat: File does not exist:
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>>
>>
>> He file looks like this (tab seperated):
>> head
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> @SRR014475.1 :1:1:108:111 length=36 GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
>> I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
>> @SRR014475.2 :1:1:112:26 length=36 GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
>> I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
>> @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>> IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
>> @SRR014475.4 :1:1:124:64 length=36 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>> IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
>> @SRR014475.5 :1:1:108:897 length=36 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>> I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
>> @SRR014475.6 :1:1:106:14 length=36 GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
>> I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
>> @SRR014475.7 :1:1:118:934 length=36 GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
>> III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
>> @SRR014475.8 :1:1:123:8 length=36 GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
>> I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
>> @SRR014475.9 :1:1:118:88 length=36 GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
>> IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
>> @SRR014475.10 :1:1:92:122 length=36 ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
>> IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>>
>>
>> and the result like this:
>>
>> cat
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> |./bowtiestreaming.sh |head
>> @SRR014475.3 :1:1:101:937 length=36 +
>> gi|110640213|ref|NC_008253.1| 3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>> IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G 0 7:T>C,27:G>T
>> @SRR014475.4 :1:1:124:64 length=36 +
>> gi|110640213|ref|NC_008253.1| 2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>> IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; 0 30:T>C
>> @SRR014475.5 :1:1:108:897 length=36 +
>> gi|110640213|ref|NC_008253.1| 4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>> I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ 0
>> 5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
>> @SRR014475.9 :1:1:118:88 length=36 -
>> gi|110640213|ref|NC_008253.1| 3598410 GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC
>> %$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII 0
>> @SRR014475.15 :1:1:87:967 length=36 +
>> gi|110640213|ref|NC_008253.1| 4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC
>> IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II' 0 6:G>A,26:G>T
>> @SRR014475.20 :1:1:108:121 length=36 -
>> gi|110640213|ref|NC_008253.1| 37761 AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC
>> I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII 0 12:C>T
>> @SRR014475.23 :1:1:75:54 length=36 +
>> gi|110640213|ref|NC_008253.1| 2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
>> IIIIIIIIIIIICII<III;';29=9I.4%EE2)*' 0
>> @SRR014475.24 :1:1:89:904 length=36 -
>> gi|110640213|ref|NC_008253.1| 3216193 ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC
>> #%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII 0
>> 18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
>> @SRR014475.27 :1:1:74:887 length=36 -
>> gi|110640213|ref|NC_008253.1| 540567 AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC
>> *&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII 0 34:C>A,35:C>A
>> @SRR014475.30 :1:1:123:73 length=36 +
>> gi|110640213|ref|NC_008253.1| 3391697 AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT
>> IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;; 0 30:C>T,34:G>T
>>
>>
>> Any ideas?
>>
>> best Regards,
>>
>> Romeo
>>
>>
>> -------------
>> Romeo Kienzler
>> r o m e o @ o r m i u m . d e
>>
Re: Question on Hadoop Streaming
Posted by Romeo Kienzler <ro...@ormium.de>.
Hi,
the following command works:
hadoop jar
hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar
-input input -output output2 -mapper /root/bowtiestreaming.sh -reducer NONE
Best Regards,
Romeo
On 12/06/2011 10:49 AM, Brock Noland wrote:
> Does you job end with an error?
>
> I am guessing what you want is:
>
> -mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'
>
> First option says use your script as a mapper and second says ship
> your script as part of the job.
>
> Brock
>
> On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler<ro...@ormium.de> wrote:
>> Hi,
>>
>> I've got the following setup for NGS read alignment:
>>
>>
>> A script accepting data from stdin/out:
>> ------------------------------------------------------------
>> cat /root/bowtiestreaming.sh
>> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
>> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
>> 2> /root/bowtie.log
>>
>>
>>
>> A file copied to HDFS:
>> ------------------------------------------------------------
>> hadoop fs -put
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>>
>> A streaming job invoked with only the mapper:
>> ------------------------------------------------------------
>> hadoop jar
>> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> -output
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>>
>> The file cannot be found even it is displayed:
>> ------------------------------------------------------------
>> hadoop fs -cat
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>> 11/12/06 09:07:47 INFO security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
>> cacheTimeout=300000
>> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
>> Instead, use mapreduce.task.attempt.id
>> cat: File does not exist:
>> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>>
>>
>> He file looks like this (tab seperated):
>> head
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> @SRR014475.1 :1:1:108:111 length=36 GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
>> I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
>> @SRR014475.2 :1:1:112:26 length=36 GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
>> I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
>> @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>> IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
>> @SRR014475.4 :1:1:124:64 length=36 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>> IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
>> @SRR014475.5 :1:1:108:897 length=36 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>> I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
>> @SRR014475.6 :1:1:106:14 length=36 GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
>> I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
>> @SRR014475.7 :1:1:118:934 length=36 GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
>> III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
>> @SRR014475.8 :1:1:123:8 length=36 GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
>> I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
>> @SRR014475.9 :1:1:118:88 length=36 GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
>> IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
>> @SRR014475.10 :1:1:92:122 length=36 ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
>> IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>>
>>
>> and the result like this:
>>
>> cat
>> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>> |./bowtiestreaming.sh |head
>> @SRR014475.3 :1:1:101:937 length=36 +
>> gi|110640213|ref|NC_008253.1| 3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
>> IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G 0 7:T>C,27:G>T
>> @SRR014475.4 :1:1:124:64 length=36 +
>> gi|110640213|ref|NC_008253.1| 2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
>> IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; 0 30:T>C
>> @SRR014475.5 :1:1:108:897 length=36 +
>> gi|110640213|ref|NC_008253.1| 4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
>> I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ 0
>> 5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
>> @SRR014475.9 :1:1:118:88 length=36 -
>> gi|110640213|ref|NC_008253.1| 3598410 GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC
>> %$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII 0
>> @SRR014475.15 :1:1:87:967 length=36 +
>> gi|110640213|ref|NC_008253.1| 4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC
>> IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II' 0 6:G>A,26:G>T
>> @SRR014475.20 :1:1:108:121 length=36 -
>> gi|110640213|ref|NC_008253.1| 37761 AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC
>> I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII 0 12:C>T
>> @SRR014475.23 :1:1:75:54 length=36 +
>> gi|110640213|ref|NC_008253.1| 2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
>> IIIIIIIIIIIICII<III;';29=9I.4%EE2)*' 0
>> @SRR014475.24 :1:1:89:904 length=36 -
>> gi|110640213|ref|NC_008253.1| 3216193 ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC
>> #%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII 0
>> 18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
>> @SRR014475.27 :1:1:74:887 length=36 -
>> gi|110640213|ref|NC_008253.1| 540567 AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC
>> *&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII 0 34:C>A,35:C>A
>> @SRR014475.30 :1:1:123:73 length=36 +
>> gi|110640213|ref|NC_008253.1| 3391697 AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT
>> IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;; 0 30:C>T,34:G>T
>>
>>
>> Any ideas?
>>
>> best Regards,
>>
>> Romeo
>>
>>
>> -------------
>> Romeo Kienzler
>> r o m e o @ o r m i u m . d e
>>
Re: Question on Hadoop Streaming
Posted by Brock Noland <br...@cloudera.com>.
Does you job end with an error?
I am guessing what you want is:
-mapper bowtiestreaming.sh -file '/root/bowtiestreaming.sh'
First option says use your script as a mapper and second says ship
your script as part of the job.
Brock
On Tue, Dec 6, 2011 at 4:59 PM, Romeo Kienzler <ro...@ormium.de> wrote:
> Hi,
>
> I've got the following setup for NGS read alignment:
>
>
> A script accepting data from stdin/out:
> ------------------------------------------------------------
> cat /root/bowtiestreaming.sh
> cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
> /home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 -
> 2> /root/bowtie.log
>
>
>
> A file copied to HDFS:
> ------------------------------------------------------------
> hadoop fs -put
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
>
> A streaming job invoked with only the mapper:
> ------------------------------------------------------------
> hadoop jar
> hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -input
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> -output
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> -mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0
>
> The file cannot be found even it is displayed:
> ------------------------------------------------------------
> hadoop fs -cat
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
> 11/12/06 09:07:47 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> cat: File does not exist:
> /user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
>
>
> He file looks like this (tab seperated):
> head
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> @SRR014475.1 :1:1:108:111 length=36 GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA
> I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
> @SRR014475.2 :1:1:112:26 length=36 GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC
> I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
> @SRR014475.3 :1:1:101:937 length=36 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
> IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
> @SRR014475.4 :1:1:124:64 length=36 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
> IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
> @SRR014475.5 :1:1:108:897 length=36 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
> I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
> @SRR014475.6 :1:1:106:14 length=36 GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT
> I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
> @SRR014475.7 :1:1:118:934 length=36 GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT
> III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
> @SRR014475.8 :1:1:123:8 length=36 GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN
> I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
> @SRR014475.9 :1:1:118:88 length=36 GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC
> IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
> @SRR014475.10 :1:1:92:122 length=36 ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA
> IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;
>
>
> and the result like this:
>
> cat
> SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000
> |./bowtiestreaming.sh |head
> @SRR014475.3 :1:1:101:937 length=36 +
> gi|110640213|ref|NC_008253.1| 3393863 GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA
> IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G 0 7:T>C,27:G>T
> @SRR014475.4 :1:1:124:64 length=36 +
> gi|110640213|ref|NC_008253.1| 2288633 GAACACATAGAACAACAGGATTCGCCAGAACACCTG
> IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+; 0 30:T>C
> @SRR014475.5 :1:1:108:897 length=36 +
> gi|110640213|ref|NC_008253.1| 4389356 GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT
> I0I:I'IIII+IG3II46II0>C@=III()+:+2&$ 0
> 5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
> @SRR014475.9 :1:1:118:88 length=36 -
> gi|110640213|ref|NC_008253.1| 3598410 GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC
> %$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII 0
> @SRR014475.15 :1:1:87:967 length=36 +
> gi|110640213|ref|NC_008253.1| 4474247 GACTACACGATCGCCTGCCTTAATATTCTTTACACC
> IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II' 0 6:G>A,26:G>T
> @SRR014475.20 :1:1:108:121 length=36 -
> gi|110640213|ref|NC_008253.1| 37761 AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC
> I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII 0 12:C>T
> @SRR014475.23 :1:1:75:54 length=36 +
> gi|110640213|ref|NC_008253.1| 2465453 GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA
> IIIIIIIIIIIICII<III;';29=9I.4%EE2)*' 0
> @SRR014475.24 :1:1:89:904 length=36 -
> gi|110640213|ref|NC_008253.1| 3216193 ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC
> #%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII 0
> 18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
> @SRR014475.27 :1:1:74:887 length=36 -
> gi|110640213|ref|NC_008253.1| 540567 AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC
> *&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII 0 34:C>A,35:C>A
> @SRR014475.30 :1:1:123:73 length=36 +
> gi|110640213|ref|NC_008253.1| 3391697 AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT
> IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;; 0 30:C>T,34:G>T
>
>
> Any ideas?
>
> best Regards,
>
> Romeo
>
>
> -------------
> Romeo Kienzler
> r o m e o @ o r m i u m . d e
>