You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by co...@apache.org on 2008/07/02 21:58:00 UTC

[CONF] Apache Lucene Mahout: index (page edited)

index (MAHOUT) edited by Lukas Vlcek
      Page: http://cwiki.apache.org/confluence/display/MAHOUT/index
   Changes: http://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=74539&originalVersion=35&revisedVersion=36

Comment:
---------------------------------------------------------------------

added link to IR Book (some chapters are more general, ont only IR specific, discusses Hadoop)

Change summary:
---------------------------------------------------------------------

added link to IR Book (some chapters are more general, ont only IR specific, discusses Hadoop)

Change summary:
---------------------------------------------------------------------

added link to IR Book (some chapters are more general, ont only IR specific, discusses Hadoop)

Change summary:
---------------------------------------------------------------------

added link to IR Book (some chapters are more general, ont only IR specific, discusses Hadoop)

Change summary:
---------------------------------------------------------------------

added link to IR Book (some chapters are more general, ont only IR specific, discusses Hadoop)

Content:
---------------------------------------------------------------------

h1. Apache Mahout Wiki

Apache Mahout is a new Lucene TLP project to create scalable, machine learning algorithms under the Apache license. For more information on the project goals please see the [original proposal|http://ml-site.grantingersoll.com/index.php?title=Incubator_proposal].

{toc:style=disc|minlevel=2}

h2. General

[TODO]

[FAQ]

[HowToContribute]

[HowToBecomeACommitter]

[Hadoop|http://hadoop.apache.org]


h2. Community

[Books, Tutorials, Talks, Articles, News, etc. on Mahout|BooksTutorialsTalks]
[IssueTracker]
[MailingListArchives]
[PoweredBy]


h2. Design

[Collection(De-)Serialization]

[Matrix and Vector Needs]

h2. Algorithms

This section contains links to information, examples, use cases, etc. for the various algorithms we intend to implement.  Click the individual links to learn more. The initial algorithms descriptions have been copied here from the original project proposal. The algorithms are grouped by the application setting, they can be used for. In case of multiple applications, the version presented in the paper was chosen, versions as implemented in our project will be added as soon as we are working on them.

Original Paper: [Map Reduce for Machine Learning on Multicore|http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf]

Papers related to Map Reduce:
   * [Evaluating MapReduce for Multi-core and Multiprocessor Systems|http://csl.stanford.edu/~christos/publications/2007.cmp_mapreduce.hpca.pdf]

Papers, videos and books related to machine learning in general:
   * [Collection of links to presentations on learning algorithms|http://www.inma.ucl.ac.be/~francois/blog/entries/entry_757.php]
   * [Programming Collective Intelligence|http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325/ref=pd_bbs_sr_1/104-1017533-9408723?ie=UTF8&s=books&qid=1214593516&sr=1-1]
   * [Collective Intelligence in Action|http://www.amazon.com/Collective-Intelligence-Action-Satnam-Alag/dp/1933988312/ref=pd_bbs_sr_3?ie=UTF8&s=books&qid=1214545249&sr=1-3]
   * [Data Mining: Practical Machine Learning Tools and Techniques|http://www.cs.waikato.ac.nz/~ml/weka/book.html]
   * [Taming Text|http://www.manning.com/ingersoll/]
   * [Machine Learning|http://www.amazon.com/Machine-Learning-Mcgraw-Hill-International-Edit/dp/0071154671/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1214593709&sr=8-1]
   * [Pattern Recognition and Machine Learning (Information Science and Statistics) |http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738/ref=pd_bbs_sr_2?ie=UTF8&s=books&qid=1214593709&sr=8-2]
   * [Introduction to Information Retrieval|http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html]

All algorithms are either marked as _integrated_, that is the implementation is integrated into the development version of Mahout. Algorithms that are currently being developed are annotated with a link to the JIRA issue that deals with the specific implementation. Usually these issues already contain patches that are more or less major, depending on how much work was spent on the issue so far. Algorithms that have so far not been touched are marked as _open_.

h3. Classification

A general introduction to the most common text classification algorithms can be found at Google Answers: http://answers.google.com/answers/main?cmd=threadview&id=225316 For information on the algorithms implemented in Mahout (or scheduled for implementation) please visit the following pages.

[Logistic Regression] (open, GSoC project)

[NaiveBayes] ([MAHOUT-9|http://issues.apache.org/jira/browse/MAHOUT-9])

[Complementary Naive Bayes] ([MAHOUT-60|http://issues.apache.org/jira/browse/MAHOUT-60])

[Support Vector Machines] (SVM) (open: [MAHOUT-14|http://issues.apache.org/jira/browse/MAHOUT-14])

[Neural Network] (open)

h3. Clustering

[Canopy Clustering] (integrated)

[k-Means] (integrated)

[Expectation Maximization] (EM) ([MAHOUT-28|http://issues.apache.org/jira/browse/MAHOUT-28])

[Mean Shift]

[Hierarchical Clustering] ([MAHOUT-19|http://issues.apache.org/jira/browse/MAHOUT-19])

[Dirichlet Process Clustering] ([MAHOUT-30|http://issues.apache.org/jira/browse/MAHOUT-30])

h3. Regression

[Locally Weighted Linear Regression] (open)

h3. Dimension reduction

[Principal Components Analysis ] (PCA) (open)

[Independent Component Analysis] (open)

[Gaussian Discriminative Analysis] (GDA) (open)

h3. Evolutionary Algorithms

see also: [MAHOUT-56|http://issues.apache.org/jira/browse/MAHOUT-56]

You will find here information, examples, use cases, etc. related to Evolutionary Algorithms.

Introductions and Tutorials:
   * [Evolutionary Algorithms Introduction|http://www.geatbx.com/docu/algindex.html]

h3. Non map reduce algorithms

Some algorithms and applications appeared on the mailing list, that have not been published in map reduce form so far. As we do not restrict ourselves to hadoop-only versions, these proposals are listed here.

[Hidden Markov Models] (HMM) (open)

[Recommendation Learning] (integrated)

h2. Data

[Collections]

h2. Historical Information

Project inspiration and formulation can be found at [http://ml-site.grantingersoll.com]

h2. Committer's Resources

[HowToUpdateTheWebsite]

[PatchCheckList]

[ReleaseToDo]

[Apache Machine Status|http://monitoring.apache.org/status/] -- Check to see if SVN, other resources are available

h3. Other Resources

[Committer's FAQ|http://www.apache.org/dev/committers.html]

[Apache Dev|http://www.apache.org/dev/]

---------------------------------------------------------------------
CONFLUENCE INFORMATION
This message is automatically generated by Confluence

Unsubscribe or edit your notifications preferences
   http://cwiki.apache.org/confluence/users/viewnotifications.action

If you think it was sent incorrectly contact one of the administrators
   http://cwiki.apache.org/confluence/administrators.action

If you want more information on Confluence, or have a bug to report see
   http://www.atlassian.com/software/confluence