You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by praneet mhatre <pr...@gmail.com> on 2012/05/09 05:06:29 UTC

High Dimensional Datasets for Binary Classification

Hi All / Ted,

I tried looking through the mailing list first, since similar questions
have been asked before. But couldn't really find what I wanted.

Quick background - I have been working on higher order learning algorithms
(Feature Sharding to be specific) for some time. While getting this stuff
into Mahout will require some solid progress on the pig/mahout integration
front among other things, I have been exploring how vertical sharding
generally affects classifier performance using some simple code I've
written in Weka.

Most of my studies so far have been done on moderate dimensional datasets.
Can someone please suggest me some high/very high dimensional datasets
suitable for binary classification and available for free?

Thank you!

-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine