You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Dhruv Kumar <dk...@ecs.umass.edu> on 2011/03/08 17:03:25 UTC

Re: Markov Decision Process, Hidden Markov Models, GSOC 2011

On Tue, Feb 15, 2011 at 10:43 AM, Shannon Quinn <sq...@gatech.edu> wrote:

Hi Shannon,

>
> > It would be interesting to see how the current implementation can be
> > integrated into your Bioinformatics course work, what extensions you
> > need for this particular domain.
>
> I assume your HMM would be for the purpose of protein/nucleic acid sequence
> analysis and alignment. HMMs/MDPs are fairly domain-independent, but this
> would definitely be an interesting pursuit; I'm a grad student in comp bio,
> and I know a lot of folks who would love to use this.
>

Yes, I'm interested in applying HMMs for protein sequence analysis, and
seeing if they can be applied to predict the side chain conformation of
protein sequences on large data sets using Mahout. The following paper uses
a HMM in form of a Dynamic Bayesian Network for the same purpose:

http://www.biomedcentral.com/1471-2105/11/306

It will be interesting to extend Mahout's existing HMM code so that it could
be distributed over large clusters. I'm working with a professor here at
UMass on side chain structure prediction as well this semester and he said
that we can craft out a nice, tight proposal for this purpose for GSoC.

Dhruv

>
> On the topic of GSoC, I meant to mention: I'm happy to volunteer my
> services as a potential mentor for the summer.
>
> Shannon
>
> Apologies for the brevity, this was sent from my iPhone

Re: Markov Decision Process, Hidden Markov Models, GSOC 2011

Posted by Ted Dunning <te...@gmail.com>.

The Apache application to GSOC needs to go through.

But you can proceed regardless.  Filing a JIRA as you suggest is a great
idea.

On Wed, Mar 9, 2011 at 6:45 AM, Dhruv Kumar <dk...@ecs.umass.edu> wrote:

> Is there anything else needed at this point, JIRA issue creation...?

Re: Markov Decision Process, Hidden Markov Models, GSOC 2011

Posted by Dhruv Kumar <dk...@ecs.umass.edu>.

>
> That would be excellent for a GSOC project.
>

Thanks!

To begin things off, I'll be using the k-means code as a starting point to
parallelize the existing Baum-Welch algorithm, which Ted had mentioned last
year in 396.

This paper parallelizes all three--forward, Viterbi and Baum-Welch on the
NVIDIA CUDA platform: http://liuchuan.org/pub/cuHMM.pdf

Is there anything else needed at this point, JIRA issue creation...? I am
running the code at home and doing exercises from MEAP Mahout in Action, and
will probably write some documentation for the HMM implementation which is
not present in the wiki at this point.

Dhruv


>
> On Tue, Mar 8, 2011 at 8:03 AM, Dhruv Kumar <dk...@ecs.umass.edu> wrote:
>
> > On Tue, Feb 15, 2011 at 10:43 AM, Shannon Quinn <sq...@gatech.edu>
> wrote:
> >
> > Hi Shannon,
> >
> >
> > >
> > > > It would be interesting to see how the current implementation can be
> > > > integrated into your Bioinformatics course work, what extensions you
> > > > need for this particular domain.
> > >
> > > I assume your HMM would be for the purpose of protein/nucleic acid
> > sequence
> > > analysis and alignment. HMMs/MDPs are fairly domain-independent, but
> this
> > > would definitely be an interesting pursuit; I'm a grad student in comp
> > bio,
> > > and I know a lot of folks who would love to use this.
> > >
> >
> > Yes, I'm interested in applying HMMs for protein sequence analysis, and
> > seeing if they can be applied to predict the side chain conformation of
> > protein sequences on large data sets using Mahout. The following paper
> uses
> > a HMM in form of a Dynamic Bayesian Network for the same purpose:
> >
> > http://www.biomedcentral.com/1471-2105/11/306
> >
> > It will be interesting to extend Mahout's existing HMM code so that it
> > could
> > be distributed over large clusters. I'm working with a professor here at
> > UMass on side chain structure prediction as well this semester and he
> said
> > that we can craft out a nice, tight proposal for this purpose for GSoC.
> >
> > Dhruv
> >
> >
> >
> >
> > >
> > > On the topic of GSoC, I meant to mention: I'm happy to volunteer my
> > > services as a potential mentor for the summer.
> > >
> > > Shannon
> > >
> > > Apologies for the brevity, this was sent from my iPhone
> >
>

Re: Markov Decision Process, Hidden Markov Models, GSOC 2011

Posted by Ted Dunning <te...@gmail.com>.

That would be excellent for a GSOC project.

On Tue, Mar 8, 2011 at 8:03 AM, Dhruv Kumar <dk...@ecs.umass.edu> wrote:

> On Tue, Feb 15, 2011 at 10:43 AM, Shannon Quinn <sq...@gatech.edu> wrote:
>
> Hi Shannon,
>
>
> >
> > > It would be interesting to see how the current implementation can be
> > > integrated into your Bioinformatics course work, what extensions you
> > > need for this particular domain.
> >
> > I assume your HMM would be for the purpose of protein/nucleic acid
> sequence
> > analysis and alignment. HMMs/MDPs are fairly domain-independent, but this
> > would definitely be an interesting pursuit; I'm a grad student in comp
> bio,
> > and I know a lot of folks who would love to use this.
> >
>
> Yes, I'm interested in applying HMMs for protein sequence analysis, and
> seeing if they can be applied to predict the side chain conformation of
> protein sequences on large data sets using Mahout. The following paper uses
> a HMM in form of a Dynamic Bayesian Network for the same purpose:
>
> http://www.biomedcentral.com/1471-2105/11/306
>
> It will be interesting to extend Mahout's existing HMM code so that it
> could
> be distributed over large clusters. I'm working with a professor here at
> UMass on side chain structure prediction as well this semester and he said
> that we can craft out a nice, tight proposal for this purpose for GSoC.
>
> Dhruv
>
>
>
>
> >
> > On the topic of GSoC, I meant to mention: I'm happy to volunteer my
> > services as a potential mentor for the summer.
> >
> > Shannon
> >
> > Apologies for the brevity, this was sent from my iPhone
>