You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Guillaume Billard <gb...@krysalide.fr> on 2010/05/11 15:02:07 UTC

Several questions about Mahout

Hello,

My company is looking into creating a website for clothes shopping built around a recommendation engine. User criteria would be past purchases and items that have been looked at (à la Amazon), measurements, personal tastes (colours, clothing styles...).
Items criteria would be colours, clothing styles, etc.

It is not decided yet if we try and build our own engine or if we can benefit from an existing one.

1/ How active is Mahout development? The last commit is very recent if I'm not mistaken. Can we expect improvements/new features in the future?

2/ I believe Mahout can be used in a commercial software, free of charge, due to the Apache license. Am I mistaken?

3/ How long did it take to develop Mahout?

4/ Do you reckon it is a good idea to try and use Mahout for the project I described?

5/ How long do you reckon it would take for a programmer used to Java & database management to get used to the framework and use it effectively in a real-world app?

Sorry if these are vague questions, I'm just starting researching on this subject!
Thanks!

Guillaume


Re: Several questions about Mahout

Posted by Isabel Drost <is...@apache.org>.
On Tue Robin Anil <ro...@gmail.com> wrote:

> > 3/ How long did it take to develop Mahout?
> 
> Its been active for 3 years now

One addition bit of information - the recommender code base was
contributed to Mahout early on, but already existed before that as a
project named Taste.

If you end up using Mahout - it would be great if you could add
yourself on our "Powered by" wiki page. Always great to show new users
who else is into Mahout. https://cwiki.apache.org/MAHOUT/poweredby.html

Isabel

Re: Several questions about Mahout

Posted by Sean Owen <sr...@gmail.com>.
On Tue, May 11, 2010 at 2:39 PM, Robin Anil <ro...@gmail.com> wrote:
>> 1/ How active is Mahout development? The last commit is very recent if I'm
>> not mistaken. Can we expect improvements/new features in the future?

Very active. I myself committed twice today. There are a number of
people committing regularly.
In fact I'd almost describe that as a warning: things are changing
significantly and often.

here are the current open issues to give a sense of what may be coming:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310751&fixfor=12314396


>> 2/ I believe Mahout can be used in a commercial software, free of charge,
>> due to the Apache license. Am I mistaken?
>> http://www.apache.org/foundation/licence-FAQ.html

Yes it's Apache licensed and as the link says, the license does not
forbid commercial use.


>> 4/ Do you reckon it is a good idea to try and use Mahout for the project I
>> described?

Yes though based on your description you're really doing two things:

1. Recommending items based on user ratings and behavior

2a. Artificially boosting items that match tastes
2b. Filtering out unsuitable items

1 is core collaborative filtering - yes definitely supported
2 is really just some filtering, not CF or even really content-based
recommendation. It's not hard, and there's some support (Rescorer).
Though it's so domain-specific that the filtering logic is mostly up
to you to write.


>> 5/ How long do you reckon it would take for a programmer used to Java &
>> database management to get used to the framework and use it effectively in a
>> real-world app?

I don't think the recommender is hard to understand. Maybe a couple
days of playing with it? Recommenders are minor-league rocket science
compared to most machine learning. The non-distributed code is
particularly simple... this is about as much as you need to know to
get going: http://lucene.apache.org/mahout/taste.html

The distributed implementation is much newer and is more complex,
requiring you to get MapReduce and Hadoop. The best writeup is in
Mahout in Action, at the moment (plug). It may take a good week or two
to get used to Hadoop and how the recommender works.

Re: Several questions about Mahout

Posted by Robin Anil <ro...@gmail.com>.
Hi Guillaume
On Tue, May 11, 2010 at 6:32 PM, Guillaume Billard <gb...@krysalide.fr>wrote:

> Hello,
>
> My company is looking into creating a website for clothes shopping built
> around a recommendation engine. User criteria would be past purchases and
> items that have been looked at (à la Amazon), measurements, personal tastes
> (colours, clothing styles...).
> Items criteria would be colours, clothing styles, etc.
>
> It is not decided yet if we try and build our own engine or if we can
> benefit from an existing one.
>
> 1/ How active is Mahout development? The last commit is very recent if I'm
> not mistaken. Can we expect improvements/new features in the future?
>
http://fisheye6.atlassian.com/reports/mahout
 <http://fisheye6.atlassian.com/reports/mahout>Yes tons of them 5 cool new
algorithms to come this summer with GSOC projects alone. And plenty of
improvements to the core structures

>
> 2/ I believe Mahout can be used in a commercial software, free of charge,
> due to the Apache license. Am I mistaken?
> http://www.apache.org/foundation/licence-FAQ.html
> 3/ How long did it take to develop Mahout?

Its been active for 3 years now

> 4/ Do you reckon it is a good idea to try and use Mahout for the project I
> described?
> YES (Though as a developer its a biased opinion) See the code, browse the
> wiki
> 5/ How long do you reckon it would take for a programmer used to Java &
> database management to get used to the framework and use it effectively in a
> real-world app?
> Depends on effectively. You can get up and running very quickly (single
> command line) provided you have transformed your data into the correct
> format. You might need to spend a little time understanding that structure.
> Thats it

 Sorry if these are vague questions, I'm just starting researching on this
> subject!
>
These are some vague answers :) But these links might be able to answer it
more in detail

Robin

Re: Several questions about Mahout

Posted by Sebastian Schelter <se...@zalando.de>.
Hello Guillaume,

I'm not a Mahout committer, so I can't answer all of your questions, but
I'm currently writing my diploma thesis about evaluating several
recommendation approaches. The company I'm doing this for is a big
german fashion store, so I feel we kinda share the same use-case and
maybe you can profit from the things I found out so far :)
> 1/ How active is Mahout development? The last commit is very recent if I'm not mistaken. Can we expect improvements/new features in the future?
>   

Actually very active, I filed some bug reports in the past, which were
all addressed immediately (usually on the same or next day)

> 4/ Do you reckon it is a good idea to try and use Mahout for the project I described?
>   
I think it's a good idea as I'm using it as the basis for the software
of my diploma thesis :) Mahout has support for Collaborative Filtering
(users who like this also like...) which I currently use with data about
views und sales. It has no explicit support for content-based
recommendation (where you consider the features of items like color etc)
AFAIK, but I think it offers tools that can aid you in building a
content-based recommender. One idea I'm currently evaluating is
extracting the most-liked feature combinations of each user with
Frequent Itemset Mining offered by Mahout. These combinations could then
be used to find products matching them best using a lucene search.
> 5/ How long do you reckon it would take for a programmer used to Java & database management to get used to the framework and use it effectively in a real-world app?
>   

I can answer this from personal experience:

>From a theoretical point of view I had to read some papers on
Collaborative Filtering to get the theory behind that recommendation
approach, but that isn't too hard. For getting it done practically I had
to dive in the source code of taste which I found not too hard a thing
to do either.


I hope I could help you to get a little more insight on this,
Sebastian