You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by António Lemos <ar...@gmail.com> on 2011/02/22 17:09:19 UTC

Question about Mahout capabilities

Hi,

My name is António Lemos and I’m from Portugal.

I’m now answering a RFP for a web portal development and I’m specially
focused on the recommendation part.

I’m completely new to recommendation and collaborative filtering solutions
but from what I was able to understand Apache Mahout seems to address my
requirements.

First things first, from what I understood, Mahout is a learning machine
available as a scalable library that does (among other things)
classifications and collaborative filtering (CF) assured by Taste that
accomplish both item-based and user-based recommendations.

My first question is basically, if Mahout can be executed in real-time or as
a batch process?

Now, from the RFP requirements, the portal should be able to produce and
manage recommendations, namely:

-          Produce recommendations based on explicit rules or explicit
profile data;

-          Create profiles based on user browsing and use that information
in suggestion rules (e.g. If a specific user reads articles about Alex
Ferguson I can infer that he is a Manchester United supporter and thus the
recommendation engine will recommend articles related to Man. Utd.);

-          Produce recommendations based on content views or research;

-          Collaborative filtering capabilities and real-time analytic
models;

Do you know if Mahout addresses all these requirements? I’m searching for
articles related to Mahout integration (in
https://cwiki.apache.org/confluence/display/MAHOUT/MahoutIntegration this
section is blank) namely integration with the WCM we are going to propose
(EZ Publish) but I was not able to find anything, do you have any insights
about how does Mahout integrates with WCM solutions?

My last doubt, - I think is probably related to the previous one, - is
basically a question about Mahout integration with user session logging.
Let’s say an anonymous user browses article A1 and A2 and rates item I1 and
I2, at this point the recommendation engine should be able to produce some
recommendation for related articles and items (explicit or not). What
happens to the user preferences when anonymous user decides to login in the
website? Are they included in the data model and thus related to that user
or they are lost?


Regards,

António Lemos

Re: Question about Mahout capabilities

Posted by Sean Owen <sr...@gmail.com>.
You can create an item-item similarity metric (ItemSimilarity) that
implements whatever rule you want, yes. Then you can use it with an
item-based recommender. In that sense it does, but, of course you have
to write your rule. It doesn't exist in the project.

It also has a Rescorer abstraction which lets a caller filter or
reorder recommendations according to whatever logic you can write in
java code.

2011/2/22 António Lemos <ar...@gmail.com>:
> Hi Sean,
>
> Thanks a lot for your fast reply.
> Regarding the recommendations based on explicit rules, basically it's
> something like, if someone reads an article about incidents in Turkey I want
> an explicit recommendation to the article related to incidents in Egypt.
> Does Mahout handle explicit rules?
>
> Best regards,
> AL

Re: Question about Mahout capabilities

Posted by Sean Owen <sr...@gmail.com>.
2011/2/22 António Lemos <ar...@gmail.com>:
> My first question is basically, if Mahout can be executed in real-time or as
> a batch process?

Both. The "Taste" bit is all real-time. (Of course, you can use a
real-time engine to generate recommendations in a background batch
process if you like.)

There is also a Hadoop-based implementation, which is quite separate.
That is much more scalable but is necessarily a batch-oriented
process, not real-time.



> -          Produce recommendations based on explicit rules or explicit
> profile data;
>
> -          Create profiles based on user browsing and use that information
> in suggestion rules (e.g. If a specific user reads articles about Alex
> Ferguson I can infer that he is a Manchester United supporter and thus the
> recommendation engine will recommend articles related to Man. Utd.);
>
> -          Produce recommendations based on content views or research;
>
> -          Collaborative filtering capabilities and real-time analytic
> models;

These aren't really specific enough to say much about. The question is
really, does Mahout have enough hooks and extension points to add in
the business-specific logic you mention above? yes, I think you'll
find there is a way to implement your logic within the framework.

For example, it's up to you to decide how "content views or research"
translates into user-item preferences. That's not something Mahout can
do for you. But, given those preferences, it can make recommendations
easily.


> Do you know if Mahout addresses all these requirements? I’m searching for
> articles related to Mahout integration (in
> https://cwiki.apache.org/confluence/display/MAHOUT/MahoutIntegration this
> section is blank) namely integration with the WCM we are going to propose
> (EZ Publish) but I was not able to find anything, do you have any insights
> about how does Mahout integrates with WCM solutions?

There is no particular integration with any content management system,
but I don't know what particular support would be appropriate. It runs
as a Java process, or a web-based service accessed over HTTP. That can
be accessed from just about anything.


> My last doubt, - I think is probably related to the previous one, - is
> basically a question about Mahout integration with user session logging.
> Let’s say an anonymous user browses article A1 and A2 and rates item I1 and
> I2, at this point the recommendation engine should be able to produce some
> recommendation for related articles and items (explicit or not). What
> happens to the user preferences when anonymous user decides to login in the
> website? Are they included in the data model and thus related to that user
> or they are lost?

That is more up to your application than the engine. Anonymous users'
actions are still identifiable by cookie, for instance. You can create
them temporarily in the data model and recommend for them, yes. You
can merge two users later, with a bit of code. But that is mostly a
function of your web app, not the engine.