You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mark <st...@gmail.com> on 2011/06/18 19:52:33 UTC
Trending patterns
Sorry if this isn't the right place to ask but how would I go about
finding trending data over a certain period of time.
For example: http://www.ebay.com has a section "Trends on eBay" that is
updated daily. I was wondering how this can be accomplished using Mahout
(if possible)
For input I have:
- user searches by day
- titles of products purchased by day
Would this require some sort of clustering? classification?
Thanks in advance
Re: Trending patterns
Posted by Ted Dunning <te...@gmail.com>.
Two things will help in addition to what Josh suggested:
a) when looking for items that are trending hot, use the difference in the
log rank as a score. For most internetly things, rank is proportional to
1/rate so log rank is -log rate. Refining this slightly to -log (epsilon +
1/rank) makes things a little less jumpy.
b) use various forms of coalescence. If you are trending queries, normalize
the queries by sorting terms. If you have a category handy, try that.
Always invent a display name, of course. Usually, I just use the most
common input that maps to a coalesced group.
Item (b) may involve clustering or it may not. Depends on the data you have
and the exact results you want.
On Sat, Jun 18, 2011 at 7:52 PM, Mark <st...@gmail.com> wrote:
> Sorry if this isn't the right place to ask but how would I go about finding
> trending data over a certain period of time.
>
> For example: http://www.ebay.com has a section "Trends on eBay" that is
> updated daily. I was wondering how this can be accomplished using Mahout (if
> possible)
>
> For input I have:
> - user searches by day
> - titles of products purchased by day
>
> Would this require some sort of clustering? classification?
>
> Thanks in advance
>
Re: Trending patterns
Posted by Mat Kelcey <ma...@gmail.com>.
<shamless-self-promotion>
two of my most popular blog posts are on this exactly!
a trending topics algorithm (note: not a _frequent_ topics algorithm,
a _trending_ topics algorithm)
http://matpalm.com/blog/2010/04/27/trending-topics-in-tweets-about-cheese-part1/
and implemented in pig
http://matpalm.com/blog/2010/05/01/trending-topics-in-tweets-about-cheese-part2/
though it doesn't have anything to with mahout...
mat
</shameless-self-promotion>
On 18 June 2011 10:52, Mark <st...@gmail.com> wrote:
> Sorry if this isn't the right place to ask but how would I go about finding
> trending data over a certain period of time.
>
> For example: http://www.ebay.com has a section "Trends on eBay" that is
> updated daily. I was wondering how this can be accomplished using Mahout (if
> possible)
>
> For input I have:
> - user searches by day
> - titles of products purchased by day
>
> Would this require some sort of clustering? classification?
>
> Thanks in advance
>
Re: Trending patterns
Posted by Josh Patterson <jo...@cloudera.com>.
I think the most simple way to do this woulds be to bucket/group-by
the timestamp and then look for the most frequent search/item/product
in each bucket. Fairly simple MapReduce job.
Josh
On Sat, Jun 18, 2011 at 1:52 PM, Mark <st...@gmail.com> wrote:
> Sorry if this isn't the right place to ask but how would I go about finding
> trending data over a certain period of time.
>
> For example: http://www.ebay.com has a section "Trends on eBay" that is
> updated daily. I was wondering how this can be accomplished using Mahout (if
> possible)
>
> For input I have:
> - user searches by day
> - titles of products purchased by day
>
> Would this require some sort of clustering? classification?
>
> Thanks in advance
>
--
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv