You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by co...@apache.org on 2010/01/22 02:04:00 UTC

[CONF] Apache Lucene Mahout > Collections

Space: Apache Lucene Mahout (http://cwiki.apache.org/confluence/display/MAHOUT)
Page: Collections (http://cwiki.apache.org/confluence/display/MAHOUT/Collections)


Edited by Grant Ingersoll:
---------------------------------------------------------------------
TODO: Organize these somehow
Organize by usage? (classification, recommendation etc.)


[theinfo|http://theinfo.org/]

[http://www.cs.technion.ac.il/~gabr/resources/data/ne_datasets.html]

[4 Universities Data Set|http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/]

[20Newsgroups|http://people.csail.mit.edu/jrennie/20Newsgroups/]

[UniProt|http://beta.uniprot.org/]
[Netflix Prize/Dataset|http://www.netflixprize.com/download]
[WordNet|http://wordnet.princeton.edu/obtain]
[DBPedia|http://wiki.dbpedia.org/Downloads30]
[UCI Machine Learning Repo|http://archive.ics.uci.edu/ml/]

[http://mloss.org/community/blog/2008/sep/19/data-sources/]

[http://www.icwsm.org/2009/data/]

[Book usage and recommendation data from the University of Huddersfield|http://library.hud.ac.uk/data/usagedata/]

http://ece.ut.ac.ir/DBRG/Hamshahri/ (Approximately 160k categorized docs)
There is a newer beta verson here:
http://ece.ut.ac.ir/DBRG/Hamshahri/ham2/ (Approximately 320k categorized docs)

http://data.gov
http://www.ckan.net/
http://www.guardian.co.uk/news/datablog/2010/jan/07/government-data-world
http://data.gov.uk/

Change your notification preferences: http://cwiki.apache.org/confluence/users/viewnotifications.action