You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Shem Cristobal <sh...@gmail.com> on 2011/05/07 14:40:50 UTC

Anyone Experienced in HTTP Logs as Data Source for Recommendations

Dear All, we are hoping to generate a recommendation from HTTP logs of a
certain web site. Is this even advisable? What sort of recommendations have
you experienced using such HTTP logs? Thanks a lot!



Best regards,

@shemcristobal

Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Ted Dunning <te...@gmail.com>.
Shem,

What Steven says is very much correct.  I have used web logs several times
for recommendations with very good results.

I would add to Steven's comment about how to interpret user actions that you
really need to think about what action indicates user interest.  It is
common to use clicks for this, but that is commonly not so good.  It is
better to have something more than a quick impulse that indicates whether
the user actually engages with the item.

It is also very important to keep track of what items that users had an
opportunity to engage with.  This ultimately helps with a lot of problems.
 It helps you figure out who is a spammer.  It also helps you determine
actual level of interest.

I recommend that you group all interactions and impressions by user id or by
session id and order by time.  That will let you extract features from the
session.  One important session feature is how long somebody actually spent
on the item you might be recommending.  If they went to another item very
quickly, that indicates lack of engagement.

It is very common that the logs you have initially don't contain the high
quality information you want.  For instance, you might have a search engine
that you are trying to improve by looking at what people click on.  Your log
might include the search query and the clicks, but it probably doesn't
include the top 20 items from the search, nor an indicator of how long the
user spent on the page they clicked to.  The click is motivated by the
snippet you show, but that is a very noisy indicator of the content so it
can be misleading.

The improvements you could make to logs like this would be to log all of the
results that the user sees and to put a timer beacon on the first level
clicked page that tells you when the user has spent 20 seconds on that
second page.  If you see search, impression, click and beacon, then you know
that the page has some real interest.  You can get started with your initial
log contents here, but getting augmented data would help much more.

As a secondary point, measuring engagement instead of initial interest can
actually make the actual recommendation process faster as well because the
data size goes down dramatically.


On Sat, May 7, 2011 at 12:02 PM, Steven Bourke <sb...@gmail.com> wrote:

> Hi Shem,
>
> I've tried something similar, and it is indeed more than possible. The real
> problems comes down to how you'll actually interpret user interactions on
> the site. A users session may vary drastically across multiple different
> sessions, also if you are just tracking by IP address you may lose the real
> personalisation aspect. In my case I used a IP, Webpage representation and
> recommended based on the most popular items.
>
> Seems to be sufficient.
>
> 2011/5/7 Danny Leshem <dl...@gmail.com>
>
> >  (18) קצת מזכיר לי את דרבי בר... אבל לא נראה לי שזה קשור מחיפוש באינטרנט.
> > (15) זה כמובן ננוצ'קה בלילינבלום.
> >
> > -----Original Message-----
> > From: Shem Cristobal [mailto:shem.cristobal@gmail.com]
> > Sent: Saturday, May 07, 2011 15:41
> > To: user@mahout.apache.org
> > Subject: Anyone Experienced in HTTP Logs as Data Source for
> Recommendations
> >
> > Dear All, we are hoping to generate a recommendation from HTTP logs of a
> > certain web site. Is this even advisable? What sort of recommendations
> have
> > you experienced using such HTTP logs? Thanks a lot!
> >
> >
> >
> > Best regards,
> >
> > @shemcristobal
> >
> >
>

Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Steven Bourke <sb...@gmail.com>.
Hi Shem,

I've tried something similar, and it is indeed more than possible. The real
problems comes down to how you'll actually interpret user interactions on
the site. A users session may vary drastically across multiple different
sessions, also if you are just tracking by IP address you may lose the real
personalisation aspect. In my case I used a IP, Webpage representation and
recommended based on the most popular items.

Seems to be sufficient.

2011/5/7 Danny Leshem <dl...@gmail.com>

>  (18) קצת מזכיר לי את דרבי בר... אבל לא נראה לי שזה קשור מחיפוש באינטרנט.
> (15) זה כמובן ננוצ'קה בלילינבלום.
>
> -----Original Message-----
> From: Shem Cristobal [mailto:shem.cristobal@gmail.com]
> Sent: Saturday, May 07, 2011 15:41
> To: user@mahout.apache.org
> Subject: Anyone Experienced in HTTP Logs as Data Source for Recommendations
>
> Dear All, we are hoping to generate a recommendation from HTTP logs of a
> certain web site. Is this even advisable? What sort of recommendations have
> you experienced using such HTTP logs? Thanks a lot!
>
>
>
> Best regards,
>
> @shemcristobal
>
>

Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Benson Margulies <bi...@gmail.com>.
Did you really mean to send this? It's not obviously relevant even if
translated into english.

2011/5/7 Danny Leshem <dl...@gmail.com>:
>  (18) קצת מזכיר לי את דרבי בר... אבל לא נראה לי שזה קשור מחיפוש באינטרנט.
> (15) זה כמובן ננוצ'קה בלילינבלום.
>
> -----Original Message-----
> From: Shem Cristobal [mailto:shem.cristobal@gmail.com]
> Sent: Saturday, May 07, 2011 15:41
> To: user@mahout.apache.org
> Subject: Anyone Experienced in HTTP Logs as Data Source for Recommendations
>
> Dear All, we are hoping to generate a recommendation from HTTP logs of a
> certain web site. Is this even advisable? What sort of recommendations have
> you experienced using such HTTP logs? Thanks a lot!
>
>
>
> Best regards,
>
> @shemcristobal
>
>

RE: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Danny Leshem <dl...@gmail.com>.
 (18) קצת מזכיר לי את דרבי בר... אבל לא נראה לי שזה קשור מחיפוש באינטרנט.
(15) זה כמובן ננוצ'קה בלילינבלום.

-----Original Message-----
From: Shem Cristobal [mailto:shem.cristobal@gmail.com] 
Sent: Saturday, May 07, 2011 15:41
To: user@mahout.apache.org
Subject: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Dear All, we are hoping to generate a recommendation from HTTP logs of a
certain web site. Is this even advisable? What sort of recommendations have
you experienced using such HTTP logs? Thanks a lot!



Best regards,

@shemcristobal


Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Sean Owen <sr...@gmail.com>.
As far as Mahout is concerned, you just need input of the form
"user,item" (no rating necessary) where those are two numerical
identifiers. I imagine each logged request contains something like a
user ID and other thing you want to recommend -- video ID, item ID,
etc. (If it's not numeric, you'd have to hash it and store the
mapping, since you do need numeric IDs.)

You would need to use algorithms appropriate for use when there are no
ratings, though. Are you thinking of using Hadoop or a non-distributed
version?

You can do a translation from your logs to a simple CSV format like
the above and use that as input. You can also modify the code to read
your logs format directly if you like, and avoid the translation step.

If you can say more about what you want to do, can probably say more
about how to do it.

On Sat, May 7, 2011 at 1:40 PM, Shem Cristobal <sh...@gmail.com> wrote:
> Dear All, we are hoping to generate a recommendation from HTTP logs of a
> certain web site. Is this even advisable? What sort of recommendations have
> you experienced using such HTTP logs? Thanks a lot!
>
>
>
> Best regards,
>
> @shemcristobal
>

Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Federico Castanedo <fc...@inf.uc3m.es>.
Hi Shem,

I would like to recommend you this paper:
http://research.microsoft.com/en-us/um/people/sdumais/chi08-adaretal-final.pdf

not directly related with recommendations but a good study about web
logs patterns.

Bests,
Federico

2011/5/9 Shem Cristobal <sh...@gmail.com>:
> Thanks Sean, Markus, Steven and Ted for your inputs as I'm very much
> enlightened and can proceed now to building a sound recommendation system
> based on HTTP logs.
>
> On Sat, May 7, 2011 at 8:40 PM, Shem Cristobal <sh...@gmail.com>wrote:
>
>> Dear All, we are hoping to generate a recommendation from HTTP logs of a
>> certain web site. Is this even advisable? What sort of recommendations have
>> you experienced using such HTTP logs? Thanks a lot!
>>
>>
>>
>> Best regards,
>>
>> @shemcristobal
>>
>
>
>
> --
> Best regards,
>
> @shemcristobal
>

Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Shem Cristobal <sh...@gmail.com>.
Thanks Sean, Markus, Steven and Ted for your inputs as I'm very much
enlightened and can proceed now to building a sound recommendation system
based on HTTP logs.

On Sat, May 7, 2011 at 8:40 PM, Shem Cristobal <sh...@gmail.com>wrote:

> Dear All, we are hoping to generate a recommendation from HTTP logs of a
> certain web site. Is this even advisable? What sort of recommendations have
> you experienced using such HTTP logs? Thanks a lot!
>
>
>
> Best regards,
>
> @shemcristobal
>



-- 
Best regards,

@shemcristobal

Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

Posted by Markus Weimer <ma...@weimo.de>.
Hi,

Yahoo! uses recommender systems to personalize at least the frontpage,
and the principal input to that are click logs:

http://books.nips.cc/papers/files/nips21/NIPS2008_0916.pdf

Take care,

Markus