You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mark <st...@gmail.com> on 2011/06/18 19:52:33 UTC

Trending patterns

Sorry if this isn't the right place to ask but how would I go about 
finding trending data over a certain period of time.

For example: http://www.ebay.com has a section "Trends on eBay" that is 
updated daily. I was wondering how this can be accomplished using Mahout 
(if possible)

For input I have:
     - user searches by day
     - titles of products purchased by day

Would this require some sort of clustering? classification?

Thanks in advance

Re: Trending patterns

Posted by Ted Dunning <te...@gmail.com>.
Two things will help in addition to what Josh suggested:

a) when looking for items that are trending hot, use the difference in the
log rank as a score.  For most internetly things, rank is proportional to
1/rate so log rank is -log rate.  Refining this slightly to -log (epsilon +
1/rank) makes things a little less jumpy.

b) use various forms of coalescence.  If you are trending queries, normalize
the queries by sorting terms.  If you have a category handy, try that.
Always invent a display name, of course.  Usually, I just use the most
common input that maps to a coalesced group.

Item (b) may involve clustering or it may not.  Depends on the data you have
and the exact results you want.

On Sat, Jun 18, 2011 at 7:52 PM, Mark <st...@gmail.com> wrote:

> Sorry if this isn't the right place to ask but how would I go about finding
> trending data over a certain period of time.
>
> For example: http://www.ebay.com has a section "Trends on eBay" that is
> updated daily. I was wondering how this can be accomplished using Mahout (if
> possible)
>
> For input I have:
>    - user searches by day
>    - titles of products purchased by day
>
> Would this require some sort of clustering? classification?
>
> Thanks in advance
>

Re: Trending patterns

Posted by Mat Kelcey <ma...@gmail.com>.
<shamless-self-promotion>
two of my most popular blog posts are on this exactly!

a trending topics algorithm (note: not a _frequent_ topics algorithm,
a _trending_ topics algorithm)
http://matpalm.com/blog/2010/04/27/trending-topics-in-tweets-about-cheese-part1/

and implemented in pig
http://matpalm.com/blog/2010/05/01/trending-topics-in-tweets-about-cheese-part2/

though it doesn't have anything to with mahout...
mat
</shameless-self-promotion>

On 18 June 2011 10:52, Mark <st...@gmail.com> wrote:
> Sorry if this isn't the right place to ask but how would I go about finding
> trending data over a certain period of time.
>
> For example: http://www.ebay.com has a section "Trends on eBay" that is
> updated daily. I was wondering how this can be accomplished using Mahout (if
> possible)
>
> For input I have:
>    - user searches by day
>    - titles of products purchased by day
>
> Would this require some sort of clustering? classification?
>
> Thanks in advance
>

Re: Trending patterns

Posted by Josh Patterson <jo...@cloudera.com>.
I think the most simple way to do this woulds be to bucket/group-by
the timestamp and then look for the most frequent search/item/product
in each bucket. Fairly simple MapReduce job.

Josh

On Sat, Jun 18, 2011 at 1:52 PM, Mark <st...@gmail.com> wrote:
> Sorry if this isn't the right place to ask but how would I go about finding
> trending data over a certain period of time.
>
> For example: http://www.ebay.com has a section "Trends on eBay" that is
> updated daily. I was wondering how this can be accomplished using Mahout (if
> possible)
>
> For input I have:
>    - user searches by day
>    - titles of products purchased by day
>
> Would this require some sort of clustering? classification?
>
> Thanks in advance
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv