You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Andrew Musselman <an...@gmail.com> on 2015/01/10 08:45:39 UTC

Re: Questions about Minhash/SimHash methods

Non-negative matrix factorization would be a good addition; if you can include tests with your pull request it will help.

Assuming this is your PR:  https://github.com/apache/mahout/pull/70

Looking forward to more.

> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
> 
> Dear sir,
> 
> Here is Liang Mingqiang, an undergraduate student, highly interested in Recommender System and Mahout. I have implete Non-negative Matrix Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and pull request my code for further comment.
> 
> I test my code on my computer using movielens dataset and get reasonable result. Do I need to write and submit a test module for my code. Just because I need dataset for my test, can I add some text files in the test package?
> 
> In addition, Binary Matrix Factorization seems(BMF) very interesting, I want contribute my BMF code for Mahout in the next step. 
> 
> Last, but not least, Minhash and SimHash are very popular and useful methods in Recommender System. But I look through the source code of Mahout, there seems no Minhash and SimHash method. Does it mean those methods haven't been contributed or just because I haven't check the source code carefully. If those two methods have benn contributed, is there anyone willing to tell me the path. Thank you!
> 
> 
> Looking forward,
> ----
> Liang Mingqiang

Re: Questions about Minhash/SimHash methods

Posted by Suneel Marthi <su...@gmail.com>.
The new Scala and Spark based Math DSL is what Ted was alluding to.

See http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
        http://mahout.apache.org/users/sparkbindings/home.html
       http://mahout.apache.org/users/sparkbindings/play-with-shell.html



On Sun, Jan 11, 2015 at 7:51 PM, 梁明强 <mq...@gmail.com> wrote:

> Dear Ted Dunning,
>
> Thank you for your reply.
>
> I am a freshman in open source project, this is my first time I involved in
> open source project. So, I have no experience, and may need your
> instruction. Your reply is undoubtedly very helpful for me.
>
> As you say, I just implemented a single machine algorithm, but this is just
> the first step. Recently, I am learning Scala programming language,  in the
> next step, I will read some papers about scalable algorithm and try to
> implement it.
>
> In addition, what you mean "the new math framework" here?
>
>
> Best regards,
> Liang Mingqiang.
>
>
> 2015-01-11 23:37 GMT+09:00 Ted Dunning <te...@gmail.com>:
>
> >
> > I just looked a little bit am have a few questions.
> >
> > First, these appear to be java implementations for a single machine. How
> > scalable is that? How would it interact with the new math framework?
> >
> > Second there are a number of style issue like author tags, indentation
> and
> > such, but what I find most troubling is an almost complete lack of
> javadoc
> > and complete lack of comments about the origin of the algorithms being
> used
> > or non-trivial comments about what is happening in the code.  I see
> > comments on sections like "update w". That doesn't say anything that the
> > code doesn't say.
> >
> > Sent from my iPhone
> >
> > > On Jan 10, 2015, at 1:45, Andrew Musselman <andrew.musselman@gmail.com
> >
> > wrote:
> > >
> > > Non-negative matrix factorization would be a good addition; if you can
> > include tests with your pull request it will help.
> > >
> > > Assuming this is your PR:  https://github.com/apache/mahout/pull/70
> > >
> > > Looking forward to more.
> > >
> > >> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
> > >>
> > >> Dear sir,
> > >>
> > >> Here is Liang Mingqiang, an undergraduate student, highly interested
> in
> > Recommender System and Mahout. I have implete Non-negative Matrix
> > Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and
> > pull request my code for further comment.
> > >>
> > >> I test my code on my computer using movielens dataset and get
> > reasonable result. Do I need to write and submit a test module for my
> code.
> > Just because I need dataset for my test, can I add some text files in the
> > test package?
> > >>
> > >> In addition, Binary Matrix Factorization seems(BMF) very interesting,
> I
> > want contribute my BMF code for Mahout in the next step.
> > >>
> > >> Last, but not least, Minhash and SimHash are very popular and useful
> > methods in Recommender System. But I look through the source code of
> > Mahout, there seems no Minhash and SimHash method. Does it mean those
> > methods haven't been contributed or just because I haven't check the
> source
> > code carefully. If those two methods have benn contributed, is there
> anyone
> > willing to tell me the path. Thank you!
> > >>
> > >>
> > >> Looking forward,
> > >> ----
> > >> Liang Mingqiang
> >
>

Re: Questions about Minhash/SimHash methods

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Jan 11, 2015 at 6:51 PM, 梁明强 <mq...@gmail.com> wrote:

> In addition, what you mean "the new math framework" here?
>

Mahout has a new math framework written in scala that parallelizes
mathematical operations.

Re: Questions about Minhash/SimHash methods

Posted by 梁明强 <mq...@gmail.com>.
Dear Ted Dunning,

Thank you for your reply.

I am a freshman in open source project, this is my first time I involved in
open source project. So, I have no experience, and may need your
instruction. Your reply is undoubtedly very helpful for me.

As you say, I just implemented a single machine algorithm, but this is just
the first step. Recently, I am learning Scala programming language,  in the
next step, I will read some papers about scalable algorithm and try to
implement it.

In addition, what you mean "the new math framework" here?


Best regards,
Liang Mingqiang.


2015-01-11 23:37 GMT+09:00 Ted Dunning <te...@gmail.com>:

>
> I just looked a little bit am have a few questions.
>
> First, these appear to be java implementations for a single machine. How
> scalable is that? How would it interact with the new math framework?
>
> Second there are a number of style issue like author tags, indentation and
> such, but what I find most troubling is an almost complete lack of javadoc
> and complete lack of comments about the origin of the algorithms being used
> or non-trivial comments about what is happening in the code.  I see
> comments on sections like "update w". That doesn't say anything that the
> code doesn't say.
>
> Sent from my iPhone
>
> > On Jan 10, 2015, at 1:45, Andrew Musselman <an...@gmail.com>
> wrote:
> >
> > Non-negative matrix factorization would be a good addition; if you can
> include tests with your pull request it will help.
> >
> > Assuming this is your PR:  https://github.com/apache/mahout/pull/70
> >
> > Looking forward to more.
> >
> >> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
> >>
> >> Dear sir,
> >>
> >> Here is Liang Mingqiang, an undergraduate student, highly interested in
> Recommender System and Mahout. I have implete Non-negative Matrix
> Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and
> pull request my code for further comment.
> >>
> >> I test my code on my computer using movielens dataset and get
> reasonable result. Do I need to write and submit a test module for my code.
> Just because I need dataset for my test, can I add some text files in the
> test package?
> >>
> >> In addition, Binary Matrix Factorization seems(BMF) very interesting, I
> want contribute my BMF code for Mahout in the next step.
> >>
> >> Last, but not least, Minhash and SimHash are very popular and useful
> methods in Recommender System. But I look through the source code of
> Mahout, there seems no Minhash and SimHash method. Does it mean those
> methods haven't been contributed or just because I haven't check the source
> code carefully. If those two methods have benn contributed, is there anyone
> willing to tell me the path. Thank you!
> >>
> >>
> >> Looking forward,
> >> ----
> >> Liang Mingqiang
>

Re: Questions about Minhash/SimHash methods

Posted by Ted Dunning <te...@gmail.com>.
I just looked a little bit am have a few questions. 

First, these appear to be java implementations for a single machine. How scalable is that? How would it interact with the new math framework?  

Second there are a number of style issue like author tags, indentation and such, but what I find most troubling is an almost complete lack of javadoc and complete lack of comments about the origin of the algorithms being used or non-trivial comments about what is happening in the code.  I see comments on sections like "update w". That doesn't say anything that the code doesn't say.  

Sent from my iPhone

> On Jan 10, 2015, at 1:45, Andrew Musselman <an...@gmail.com> wrote:
> 
> Non-negative matrix factorization would be a good addition; if you can include tests with your pull request it will help.
> 
> Assuming this is your PR:  https://github.com/apache/mahout/pull/70
> 
> Looking forward to more.
> 
>> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
>> 
>> Dear sir,
>> 
>> Here is Liang Mingqiang, an undergraduate student, highly interested in Recommender System and Mahout. I have implete Non-negative Matrix Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and pull request my code for further comment.
>> 
>> I test my code on my computer using movielens dataset and get reasonable result. Do I need to write and submit a test module for my code. Just because I need dataset for my test, can I add some text files in the test package?
>> 
>> In addition, Binary Matrix Factorization seems(BMF) very interesting, I want contribute my BMF code for Mahout in the next step. 
>> 
>> Last, but not least, Minhash and SimHash are very popular and useful methods in Recommender System. But I look through the source code of Mahout, there seems no Minhash and SimHash method. Does it mean those methods haven't been contributed or just because I haven't check the source code carefully. If those two methods have benn contributed, is there anyone willing to tell me the path. Thank you!
>> 
>> 
>> Looking forward,
>> ----
>> Liang Mingqiang