You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Venkatesh U <ve...@gmail.com> on 2012/03/09 08:35:17 UTC

Some data sets with class imbalance

Dear friends,
 I am working on an algorithm which works well on imbalanced data, I need
some data sets available in public domain which I can use to test my
algorithm for addressing class imbalance. Any pointers to data sets with
class imbalance appreciated.

Thanks,
Venkatesh

Re: Some data sets with class imbalance

Posted by Nick Pentreath <ni...@gmail.com>.
For binary classification, any click-through data (like online ad click-through data) is extremely unbalanced. Of the order of <0.5% positive examples.

Yahoo has some large data sets of this nature, that can be downloaded free for research purposes from Yahoo Research (I think it's research.yahoo.com)

N

On 9 Mar 2012, at 09:35, Venkatesh U <ve...@gmail.com> wrote:

> Dear friends,
> I am working on an algorithm which works well on imbalanced data, I need
> some data sets available in public domain which I can use to test my
> algorithm for addressing class imbalance. Any pointers to data sets with
> class imbalance appreciated.
> 
> Thanks,
> Venkatesh