You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by co...@apache.org on 2008/02/03 16:21:00 UTC

[CONF] Apache Lucene Mahout: Principal Components Analysis (page created)

Principal Components Analysis (MAHOUT) created by Isabel Drost
   http://cwiki.apache.org/confluence/display/MAHOUT/Principal+Components+Analysis

Content:
---------------------------------------------------------------------

h1. Principal Components Analysis

PCA is used to reduce high dimensional data set to lower dimensions. PCA can be used to identify patterns in data, express the data in a lower dimensional space. That way, similarities and differences can be highlighted. It is mostly used in face recognition and image compression.
There are several flaws one has to be aware of when working with PCA:

* Linearity assumption - data is assumed to be linear combinations of some basis. There exist non-linear methods such as kernel PCA that alleviate that problem.
* Principal components are assumed to be orthogonal. ICA tries to cope with this limitation.
* Mean and covariance are assumed to be statistically important.
* Large variances are assumed to have important dynamics.

h2. Parallelization strategy

h2. Design of packages

---------------------------------------------------------------------
CONFLUENCE INFORMATION
This message is automatically generated by Confluence

Unsubscribe or edit your notifications preferences
   http://cwiki.apache.org/confluence/users/viewnotifications.action

If you think it was sent incorrectly contact one of the administrators
   http://cwiki.apache.org/confluence/administrators.action

If you want more information on Confluence, or have a bug to report see
   http://www.atlassian.com/software/confluence