You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Franco Nazareno <fr...@gmail.com> on 2011/03/28 04:51:14 UTC
Hadoop for Bioinformatics
Good day everyone!
First, I want to congratulate the group for this wonderful project. It did
open up new ideas and solutions in computing and technology-wise. I'm
excited to learn more about it and discover possibilities using Hadoop and
its components.
Well I just want to ask this with regards to my study. Currently I'm
studying my PhD course in Bioinformatics, and my question is that can you
give me a (rough) idea if it's possible to use Hadoop cluster in achieving a
DNA sequence alignment? My basic idea for this goes something like a string
search out of a huge data files stored in HDFS, and the application uses
MapReduce in searching and computing. As the Hadoop paradigm impies, it
doesn't serve well in interactive applications, and I think this kind of
searching is a "write-once, read-many" application.
I hope you don't mind my question. And it'll be great hearing your comments
or suggestions about this.
Thanks and more power!
Franco
Re: Hadoop for Bioinformatics
Posted by "Tsz Wo (Nicholas), Sze" <s2...@yahoo.com>.
Hi Franco,
I recall that there are some Hadoop-Blast researches/projects. For examples,
see
- http://www.cs.umd.edu/Grad/scholarlypapers/papers/MichaelSchatz.pdf
- http://salsahpc.indiana.edu/tutorial/hadoopblast.html
Nicholas
________________________________
From: Franco Nazareno <fr...@gmail.com>
To: common-user@hadoop.apache.org
Sent: Sun, March 27, 2011 7:51:14 PM
Subject: Hadoop for Bioinformatics
Good day everyone!
First, I want to congratulate the group for this wonderful project. It did
open up new ideas and solutions in computing and technology-wise. I'm
excited to learn more about it and discover possibilities using Hadoop and
its components.
Well I just want to ask this with regards to my study. Currently I'm
studying my PhD course in Bioinformatics, and my question is that can you
give me a (rough) idea if it's possible to use Hadoop cluster in achieving a
DNA sequence alignment? My basic idea for this goes something like a string
search out of a huge data files stored in HDFS, and the application uses
MapReduce in searching and computing. As the Hadoop paradigm impies, it
doesn't serve well in interactive applications, and I think this kind of
searching is a "write-once, read-many" application.
I hope you don't mind my question. And it'll be great hearing your comments
or suggestions about this.
Thanks and more power!
Franco
Re: Hadoop for Bioinformatics
Posted by Bibek Paudel <et...@gmail.com>.
On Mon, Mar 28, 2011 at 4:51 AM, Franco Nazareno
<fr...@gmail.com> wrote:
> Good day everyone!
>
>
>
> First, I want to congratulate the group for this wonderful project. It did
> open up new ideas and solutions in computing and technology-wise. I'm
> excited to learn more about it and discover possibilities using Hadoop and
> its components.
>
>
>
> Well I just want to ask this with regards to my study. Currently I'm
> studying my PhD course in Bioinformatics, and my question is that can you
> give me a (rough) idea if it's possible to use Hadoop cluster in achieving a
> DNA sequence alignment? My basic idea for this goes something like a string
> search out of a huge data files stored in HDFS, and the application uses
> MapReduce in searching and computing. As the Hadoop paradigm impies, it
> doesn't serve well in interactive applications, and I think this kind of
> searching is a "write-once, read-many" application.
Are you looking for something like a "distributed grep?" The hadoop
package comes with some examples, and 'grep' is one of them.
Please see: http://wiki.apache.org/hadoop/Grep and
http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html .
Let us know if you are looking for something else.
-b
>
>
>
> I hope you don't mind my question. And it'll be great hearing your comments
> or suggestions about this.
>
>
>
> Thanks and more power!
>
> Franco
>
>
Re: Hadoop for Bioinformatics
Posted by Kiss Tibor <ki...@gmail.com>.
Hi Franco,
We are using Hadoop for next-gen sequence alignment.
Earlier we had a classic programming model solution, but currently we are
upgrading our software services to M/R modell based on Hadoop.
We transferred most of our classic algorithms to Hadoop and I can say that
everything is getting more manageable.
We are going with Hadoop on the cloud and/or on datacenter. Another
challenge, especially with cloud, how you are transferring the data, because
in bioinformatics the amount of data are usually very high.
Currently i am working on an open-source version of Amazon multipart upload
which will be available in the next release of
JClouds<http://code.google.com/p/jclouds/wiki/BlobStore>,
here are the starting
ideas<http://www.slideshare.net/jclouds/big-data-in-real-life-a-study-on-s3-multipart-uploads>and
also a sample
client app<https://github.com/jclouds/jclouds-examples/tree/master/blobstore-largeblob>
.
If you want to follow new results on
twitter<http://twitter.com/#%21/tiborkisstibor>,
you are invited. I plan to release a paper with results of the data transfer
operations based on this open-source approach.
Also, soon we are releasing the version of our cloud based service stack
which is fully based on Hadoop.
Tibor
On Mon, Mar 28, 2011 at 4:51 AM, Franco Nazareno
<fr...@gmail.com>wrote:
> Good day everyone!
>
>
>
> First, I want to congratulate the group for this wonderful project. It did
> open up new ideas and solutions in computing and technology-wise. I'm
> excited to learn more about it and discover possibilities using Hadoop and
> its components.
>
>
>
> Well I just want to ask this with regards to my study. Currently I'm
> studying my PhD course in Bioinformatics, and my question is that can you
> give me a (rough) idea if it's possible to use Hadoop cluster in achieving
> a
> DNA sequence alignment? My basic idea for this goes something like a string
> search out of a huge data files stored in HDFS, and the application uses
> MapReduce in searching and computing. As the Hadoop paradigm impies, it
> doesn't serve well in interactive applications, and I think this kind of
> searching is a "write-once, read-many" application.
>
>
>
> I hope you don't mind my question. And it'll be great hearing your comments
> or suggestions about this.
>
>
>
> Thanks and more power!
>
> Franco
>
>
Re: Hadoop for Bioinformatics
Posted by Luca Pireddu <pi...@crs4.it>.
On March 28, 2011 04:51:14 Franco Nazareno wrote:
>
> Well I just want to ask this with regards to my study. Currently I'm
> studying my PhD course in Bioinformatics, and my question is that can you
> give me a (rough) idea if it's possible to use Hadoop cluster in achieving
> a DNA sequence alignment? My basic idea for this goes something like a
> string search out of a huge data files stored in HDFS, and the application
> uses MapReduce in searching and computing. As the Hadoop paradigm impies,
> it doesn't serve well in interactive applications, and I think this kind
> of searching is a "write-once, read-many" application.
I'll add some relevant citations:
An overview of the Hadoop/MapReduce/HBase framework and its current
applications in bioinformatics
http://www.biomedcentral.com/1471-2105/11/S12/S1
Biodoop: Bioinformatics on Hadoop
http://www.computer.org/portal/web/csdl/doi/10.1109/ICPPW.2009.37
CloudBurst: highly sensitive read mapping with MapReduce
http://bioinformatics.oxfordjournals.org/content/25/11/1363.short
CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources
for Bioinformatics Applications
http://www.computer.org/portal/web/csdl/doi/10.1109/eScience.2008.62
--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452
RE: Hadoop for Bioinformatics
Posted by Evert Lammerts <Ev...@sara.nl>.
> The short answer is yes! At CRS4 we are working on this very problem.
>
> We have implemented a Hadoop-based workflow to perform short read
> alignment to
> support DNA sequencing activities in our lab. Its alignment operation
> is
> based on (and therefore equivalent to) BWA. We have written a paper
> about it
> which will appear in the coming months, and we are working on an open
> source
> release, but alas we haven't completed that task yet.
>
> We have also implemented a Hadoop-based distributed blast alignment
> program,
> in case you're working with long fragments. It's currently being used
> by our
> collaborators to align viral DNA segments.
>
>
> In either case, if you're interested we can let you have an advance
> release of
> either program so you can try them out.
Hi Luca,
Could you send me an advanced release of your software? I work for the Dutch national center for scientific computing, and I will give a workshop on Hadoop to BioInformatics on a large BI conference (http://www.nbic.nl/about-nbic/nbic-conferences/nbic-conference-2011/). Lots of people there work with BWA and BLAST type applications (among others in the BBMRI project, which I think CRS4 is involved in as well). So BWA on Hadoop could be a great case study.
Let me know!
Cheers,
Evert
>
>
> --
> Luca Pireddu
> CRS4 - Distributed Computing Group
> Loc. Pixina Manna Edificio 1
> Pula 09010 (CA), Italy
> Tel: +39 0709250452
Re: Hadoop for Bioinformatics
Posted by Luca Pireddu <pi...@crs4.it>.
On March 28, 2011 04:51:14 Franco Nazareno wrote:
> Good day everyone!
And a good day to you Franco!
> First, I want to congratulate the group for this wonderful project. It did
> open up new ideas and solutions in computing and technology-wise. I'm
> excited to learn more about it and discover possibilities using Hadoop and
> its components.
>
>
> Well I just want to ask this with regards to my study. Currently I'm
> studying my PhD course in Bioinformatics, and my question is that can you
> give me a (rough) idea if it's possible to use Hadoop cluster in achieving
> a DNA sequence alignment? My basic idea for this goes something like a
> string search out of a huge data files stored in HDFS, and the application
> uses MapReduce in searching and computing. As the Hadoop paradigm impies,
> it doesn't serve well in interactive applications, and I think this kind
> of searching is a "write-once, read-many" application.
>
>
>
> I hope you don't mind my question. And it'll be great hearing your comments
> or suggestions about this.
>
>
>
> Thanks and more power!
>
> Franco
The short answer is yes! At CRS4 we are working on this very problem.
We have implemented a Hadoop-based workflow to perform short read alignment to
support DNA sequencing activities in our lab. Its alignment operation is
based on (and therefore equivalent to) BWA. We have written a paper about it
which will appear in the coming months, and we are working on an open source
release, but alas we haven't completed that task yet.
We have also implemented a Hadoop-based distributed blast alignment program,
in case you're working with long fragments. It's currently being used by our
collaborators to align viral DNA segments.
In either case, if you're interested we can let you have an advance release of
either program so you can try them out.
--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452