You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Andrew Musselman <an...@gmail.com> on 2015/01/10 08:45:39 UTC
Re: Questions about Minhash/SimHash methods
Non-negative matrix factorization would be a good addition; if you can include tests with your pull request it will help.
Assuming this is your PR: https://github.com/apache/mahout/pull/70
Looking forward to more.
> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
>
> Dear sir,
>
> Here is Liang Mingqiang, an undergraduate student, highly interested in Recommender System and Mahout. I have implete Non-negative Matrix Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and pull request my code for further comment.
>
> I test my code on my computer using movielens dataset and get reasonable result. Do I need to write and submit a test module for my code. Just because I need dataset for my test, can I add some text files in the test package?
>
> In addition, Binary Matrix Factorization seems(BMF) very interesting, I want contribute my BMF code for Mahout in the next step.
>
> Last, but not least, Minhash and SimHash are very popular and useful methods in Recommender System. But I look through the source code of Mahout, there seems no Minhash and SimHash method. Does it mean those methods haven't been contributed or just because I haven't check the source code carefully. If those two methods have benn contributed, is there anyone willing to tell me the path. Thank you!
>
>
> Looking forward,
> ----
> Liang Mingqiang
Re: Questions about Minhash/SimHash methods
Posted by Suneel Marthi <su...@gmail.com>.
The new Scala and Spark based Math DSL is what Ted was alluding to.
See http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
http://mahout.apache.org/users/sparkbindings/home.html
http://mahout.apache.org/users/sparkbindings/play-with-shell.html
On Sun, Jan 11, 2015 at 7:51 PM, 梁明强 <mq...@gmail.com> wrote:
> Dear Ted Dunning,
>
> Thank you for your reply.
>
> I am a freshman in open source project, this is my first time I involved in
> open source project. So, I have no experience, and may need your
> instruction. Your reply is undoubtedly very helpful for me.
>
> As you say, I just implemented a single machine algorithm, but this is just
> the first step. Recently, I am learning Scala programming language, in the
> next step, I will read some papers about scalable algorithm and try to
> implement it.
>
> In addition, what you mean "the new math framework" here?
>
>
> Best regards,
> Liang Mingqiang.
>
>
> 2015-01-11 23:37 GMT+09:00 Ted Dunning <te...@gmail.com>:
>
> >
> > I just looked a little bit am have a few questions.
> >
> > First, these appear to be java implementations for a single machine. How
> > scalable is that? How would it interact with the new math framework?
> >
> > Second there are a number of style issue like author tags, indentation
> and
> > such, but what I find most troubling is an almost complete lack of
> javadoc
> > and complete lack of comments about the origin of the algorithms being
> used
> > or non-trivial comments about what is happening in the code. I see
> > comments on sections like "update w". That doesn't say anything that the
> > code doesn't say.
> >
> > Sent from my iPhone
> >
> > > On Jan 10, 2015, at 1:45, Andrew Musselman <andrew.musselman@gmail.com
> >
> > wrote:
> > >
> > > Non-negative matrix factorization would be a good addition; if you can
> > include tests with your pull request it will help.
> > >
> > > Assuming this is your PR: https://github.com/apache/mahout/pull/70
> > >
> > > Looking forward to more.
> > >
> > >> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
> > >>
> > >> Dear sir,
> > >>
> > >> Here is Liang Mingqiang, an undergraduate student, highly interested
> in
> > Recommender System and Mahout. I have implete Non-negative Matrix
> > Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and
> > pull request my code for further comment.
> > >>
> > >> I test my code on my computer using movielens dataset and get
> > reasonable result. Do I need to write and submit a test module for my
> code.
> > Just because I need dataset for my test, can I add some text files in the
> > test package?
> > >>
> > >> In addition, Binary Matrix Factorization seems(BMF) very interesting,
> I
> > want contribute my BMF code for Mahout in the next step.
> > >>
> > >> Last, but not least, Minhash and SimHash are very popular and useful
> > methods in Recommender System. But I look through the source code of
> > Mahout, there seems no Minhash and SimHash method. Does it mean those
> > methods haven't been contributed or just because I haven't check the
> source
> > code carefully. If those two methods have benn contributed, is there
> anyone
> > willing to tell me the path. Thank you!
> > >>
> > >>
> > >> Looking forward,
> > >> ----
> > >> Liang Mingqiang
> >
>
Re: Questions about Minhash/SimHash methods
Posted by Ted Dunning <te...@gmail.com>.
On Sun, Jan 11, 2015 at 6:51 PM, 梁明强 <mq...@gmail.com> wrote:
> In addition, what you mean "the new math framework" here?
>
Mahout has a new math framework written in scala that parallelizes
mathematical operations.
Re: Questions about Minhash/SimHash methods
Posted by 梁明强 <mq...@gmail.com>.
Dear Ted Dunning,
Thank you for your reply.
I am a freshman in open source project, this is my first time I involved in
open source project. So, I have no experience, and may need your
instruction. Your reply is undoubtedly very helpful for me.
As you say, I just implemented a single machine algorithm, but this is just
the first step. Recently, I am learning Scala programming language, in the
next step, I will read some papers about scalable algorithm and try to
implement it.
In addition, what you mean "the new math framework" here?
Best regards,
Liang Mingqiang.
2015-01-11 23:37 GMT+09:00 Ted Dunning <te...@gmail.com>:
>
> I just looked a little bit am have a few questions.
>
> First, these appear to be java implementations for a single machine. How
> scalable is that? How would it interact with the new math framework?
>
> Second there are a number of style issue like author tags, indentation and
> such, but what I find most troubling is an almost complete lack of javadoc
> and complete lack of comments about the origin of the algorithms being used
> or non-trivial comments about what is happening in the code. I see
> comments on sections like "update w". That doesn't say anything that the
> code doesn't say.
>
> Sent from my iPhone
>
> > On Jan 10, 2015, at 1:45, Andrew Musselman <an...@gmail.com>
> wrote:
> >
> > Non-negative matrix factorization would be a good addition; if you can
> include tests with your pull request it will help.
> >
> > Assuming this is your PR: https://github.com/apache/mahout/pull/70
> >
> > Looking forward to more.
> >
> >> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
> >>
> >> Dear sir,
> >>
> >> Here is Liang Mingqiang, an undergraduate student, highly interested in
> Recommender System and Mahout. I have implete Non-negative Matrix
> Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and
> pull request my code for further comment.
> >>
> >> I test my code on my computer using movielens dataset and get
> reasonable result. Do I need to write and submit a test module for my code.
> Just because I need dataset for my test, can I add some text files in the
> test package?
> >>
> >> In addition, Binary Matrix Factorization seems(BMF) very interesting, I
> want contribute my BMF code for Mahout in the next step.
> >>
> >> Last, but not least, Minhash and SimHash are very popular and useful
> methods in Recommender System. But I look through the source code of
> Mahout, there seems no Minhash and SimHash method. Does it mean those
> methods haven't been contributed or just because I haven't check the source
> code carefully. If those two methods have benn contributed, is there anyone
> willing to tell me the path. Thank you!
> >>
> >>
> >> Looking forward,
> >> ----
> >> Liang Mingqiang
>
Re: Questions about Minhash/SimHash methods
Posted by Ted Dunning <te...@gmail.com>.
I just looked a little bit am have a few questions.
First, these appear to be java implementations for a single machine. How scalable is that? How would it interact with the new math framework?
Second there are a number of style issue like author tags, indentation and such, but what I find most troubling is an almost complete lack of javadoc and complete lack of comments about the origin of the algorithms being used or non-trivial comments about what is happening in the code. I see comments on sections like "update w". That doesn't say anything that the code doesn't say.
Sent from my iPhone
> On Jan 10, 2015, at 1:45, Andrew Musselman <an...@gmail.com> wrote:
>
> Non-negative matrix factorization would be a good addition; if you can include tests with your pull request it will help.
>
> Assuming this is your PR: https://github.com/apache/mahout/pull/70
>
> Looking forward to more.
>
>> On Jan 9, 2015, at 11:21 PM, 梁明强 <mq...@gmail.com> wrote:
>>
>> Dear sir,
>>
>> Here is Liang Mingqiang, an undergraduate student, highly interested in Recommender System and Mahout. I have implete Non-negative Matrix Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and pull request my code for further comment.
>>
>> I test my code on my computer using movielens dataset and get reasonable result. Do I need to write and submit a test module for my code. Just because I need dataset for my test, can I add some text files in the test package?
>>
>> In addition, Binary Matrix Factorization seems(BMF) very interesting, I want contribute my BMF code for Mahout in the next step.
>>
>> Last, but not least, Minhash and SimHash are very popular and useful methods in Recommender System. But I look through the source code of Mahout, there seems no Minhash and SimHash method. Does it mean those methods haven't been contributed or just because I haven't check the source code carefully. If those two methods have benn contributed, is there anyone willing to tell me the path. Thank you!
>>
>>
>> Looking forward,
>> ----
>> Liang Mingqiang