You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mahout.apache.org by co...@apache.org on 2008/02/03 16:12:01 UTC

[CONF] Apache Lucene Mahout: Support Vector Machines (page created)

Support Vector Machines (MAHOUT) created by Isabel Drost
http://cwiki.apache.org/confluence/display/MAHOUT/Support+Vector+Machines

Content:
---------------------------------------------------------------------

h1. Support Vector Machines

As with Naive Bayes, Support Vector Machines (or SVMs in short) can be used to solve the task of assigning objects to classes. However, the way this task is solved is completely different to the setting in Naive Bayes.

Each object is considered to be a point in _n_ dimensional feature space, _n_ being the number of features used to describe the objects numerically. In addition each object is assigned a binary label, let us assume the labels are "positive" and "negative". During learning, the algorithm tries to find a hyperplane in that space, that perfectly separates positive from negative objects.
It is trivial to think of settings where this might very well be impossible. To remedy this situation, objects can be assigned so called slack terms, that punish mistakes made during learning appropriately. That way, the algorithm is forced to find the hyperplane that causes the least number of mistakes.

Another way to overcome the problem of there being no linear hyperplane to separate positive from negative objects is to simply project each feature vector into an higher dimensional feature space and search for a linear separating hyperplane in that new space.
Usually the main problem with learning in high dimensional feature spaces is the so called curse of dimensionality. That is, their are less learning examples available than free parameters to tune. In the case of SVMs this problem is less detrimental, as SVMs impose additional structural constraints on their solutions. Each separating hyperplane needs to have a maximal margin to all training examples. In addition, that way, the solution may be based on the information encoded in only very few examples.

h2. Strategy for parallelization

h2. Design of packages

---------------------------------------------------------------------
CONFLUENCE INFORMATION
This message is automatically generated by Confluence

Unsubscribe or edit your notifications preferences
http://cwiki.apache.org/confluence/users/viewnotifications.action

If you think it was sent incorrectly contact one of the administrators
http://cwiki.apache.org/confluence/administrators.action

If you want more information on Confluence, or have a bug to report see
http://www.atlassian.com/software/confluence