You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2011/05/10 13:35:03 UTC

Re: Feedback using Mahout Taste in Master Thesis:

On Tue, May 10, 2011 at 12:24 PM, Manuel Blechschmidt
<Ma...@gmx.de> wrote:
> Hello guys,
> I used a lot of Mahout especially Taste in my Master Thesis: "An architecture for evaluating recommender systems in real world scenarios". I wanted to give some feedback about it. If somebody is interested in the whole work (97 pages) drop me an email.

Great, thanks for the kudos. It would be good to post a link to your
work on the user@ list if you like.

> I was especially difficult to get the IDMigrator working. Would be quite cool if there would be a DataModel which automatically includes String migration.

This is how it worked originally -- it just doesn't scale nearly as
well. It's really a much better idea to use numeric IDs, so the
framework pushes you that way.

> I had some problems that some interfaces did not implement the Serializable interface. I already opened a ticket MAHOUT-650.

Yes interesting issue, though I don't believe a change is called for
in the framework. The issue notes have what I consider the "right" way
to approach this.

> Is there a benchmark engine telling RMSE of the different algorithms? Would be cool if a maven command would be available. So when I implement a new recommender I can directly benchmark it against the other implementations.

RMSE is not a property of an algorithm, but an algorithm and a
particular data set at least. I don't think this is possible as a
result.

>  * getNumUsersWithPreferenceFor for the MySQL DataModel only works for at most two things and there is no warning if more are supplied

Maybe this is fixed since you looked, but it does throw an error:
    Preconditions.checkArgument(length != 0 && length <= 2, "Illegal
number of item IDs: " + length);

>  * DataModel expects that there is always only one rating from a user to an item (what about reratings?)

Yes, that's true. The most recent rating always counts. It might be
interesting to find a way to factor in re-ratings, but to actually
build that in the framework would cause scale problems and I don't
know algorithms that use it. So maybe it's better to collapse multiple
ratings into one (weighted average favoring recent one?)

> I also attached some images which should explain how Taste is doing it's job in my system.

(Images aren't included in mail to @apache.org mailing lists, you'd
have to post it elsewhere.

Re: Feedback using Mahout Taste in Master Thesis:

Posted by Manuel Blechschmidt <Ma...@gmx.de>.

Hi Sean,

On 10.05.2011, at 13:35, Sean Owen wrote:

> On Tue, May 10, 2011 at 12:24 PM, Manuel Blechschmidt
> <Ma...@gmx.de> wrote:
>> Hello guys,
>> 
> ...
>> I also attached some images which should explain how Taste is doing it's job in my system.
> 
> (Images aren't included in mail to @apache.org mailing lists, you'd
> have to post it elsewhere.


here are the links to the images. In case somebody is interested:

Username: guest
Password: guest

http://manuel.themis02.de/MasterThesisEvalRecommender/trunk/doc/images/SemanticRecommender.pdf
http://manuel.themis02.de/MasterThesisEvalRecommender/trunk/doc/images/SemanticRecommenderDataModel.pdf
http://manuel.themis02.de/MasterThesisEvalRecommender/trunk/doc/images/RecommendationEmail.png
http://manuel.themis02.de/MasterThesisEvalRecommender/trunk/doc/images/RecommenderServices.png
http://manuel.themis02.de/MasterThesisEvalRecommender/trunk/doc/images/C1_Step_3.png

Have a great week
   Manuel
-- 
Manuel Blechschmidt
Hasso-Plattner-Institut
Twitter: http://twitter.com/Manuel_B

Re: Feedback using Mahout Taste in Master Thesis:

Posted by Lance Norskog <go...@gmail.com>.

> Is there a benchmark engine telling RMSE of the different algorithms? Would be cool if a maven command would be available. So when I implement a new recommender I can directly benchmark it against the other implementations.

I'm 6 months ahead of you on this path. The different recommenders
have different "personalities", so it's hard to compare them directly.
The RecommenderEvaluator lets you check the effectiveness of test v.s.
training for one file. It only handles recommenders where the
preference values "mean" the numeric ranges. I did some refactoring
and ended up with:

https://issues.apache.org/jira/browse/MAHOUT-586



On 5/10/11, Sean Owen <sr...@gmail.com> wrote:
> On Tue, May 10, 2011 at 12:24 PM, Manuel Blechschmidt
> <Ma...@gmx.de> wrote:
>> Hello guys,
>> I used a lot of Mahout especially Taste in my Master Thesis: "An
>> architecture for evaluating recommender systems in real world scenarios".
>> I wanted to give some feedback about it. If somebody is interested in the
>> whole work (97 pages) drop me an email.
>
> Great, thanks for the kudos. It would be good to post a link to your
> work on the user@ list if you like.
>
>
>> I was especially difficult to get the IDMigrator working. Would be quite
>> cool if there would be a DataModel which automatically includes String
>> migration.
>
> This is how it worked originally -- it just doesn't scale nearly as
> well. It's really a much better idea to use numeric IDs, so the
> framework pushes you that way.
>
>
>> I had some problems that some interfaces did not implement the
>> Serializable interface. I already opened a ticket MAHOUT-650.
>
> Yes interesting issue, though I don't believe a change is called for
> in the framework. The issue notes have what I consider the "right" way
> to approach this.
>
>
>> Is there a benchmark engine telling RMSE of the different algorithms?
>> Would be cool if a maven command would be available. So when I implement a
>> new recommender I can directly benchmark it against the other
>> implementations.
>
> RMSE is not a property of an algorithm, but an algorithm and a
> particular data set at least. I don't think this is possible as a
> result.
>
>
>>  * getNumUsersWithPreferenceFor for the MySQL DataModel only works for at
>> most two things and there is no warning if more are supplied
>
> Maybe this is fixed since you looked, but it does throw an error:
>     Preconditions.checkArgument(length != 0 && length <= 2, "Illegal
> number of item IDs: " + length);
>
>
>>  * DataModel expects that there is always only one rating from a user to
>> an item (what about reratings?)
>
> Yes, that's true. The most recent rating always counts. It might be
> interesting to find a way to factor in re-ratings, but to actually
> build that in the framework would cause scale problems and I don't
> know algorithms that use it. So maybe it's better to collapse multiple
> ratings into one (weighted average favoring recent one?)
>
>
>> I also attached some images which should explain how Taste is doing it's
>> job in my system.
>
> (Images aren't included in mail to @apache.org mailing lists, you'd
> have to post it elsewhere.
>


-- 
Lance Norskog
goksron@gmail.com