You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Venkatesh U <ve...@gmail.com> on 2012/03/09 08:35:17 UTC
Some data sets with class imbalance
Dear friends,
I am working on an algorithm which works well on imbalanced data, I need
some data sets available in public domain which I can use to test my
algorithm for addressing class imbalance. Any pointers to data sets with
class imbalance appreciated.
Thanks,
Venkatesh
Re: Some data sets with class imbalance
Posted by Nick Pentreath <ni...@gmail.com>.
For binary classification, any click-through data (like online ad click-through data) is extremely unbalanced. Of the order of <0.5% positive examples.
Yahoo has some large data sets of this nature, that can be downloaded free for research purposes from Yahoo Research (I think it's research.yahoo.com)
N
On 9 Mar 2012, at 09:35, Venkatesh U <ve...@gmail.com> wrote:
> Dear friends,
> I am working on an algorithm which works well on imbalanced data, I need
> some data sets available in public domain which I can use to test my
> algorithm for addressing class imbalance. Any pointers to data sets with
> class imbalance appreciated.
>
> Thanks,
> Venkatesh