You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ap...@apache.org on 2015/03/29 20:54:01 UTC
svn commit: r1669950 -
/mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
Author: apalumbo
Date: Sun Mar 29 18:54:00 2015
New Revision: 1669950
URL: http://svn.apache.org/r1669950
Log:
add references to bank marketing dataset and Frank's blog on SGD page.
Modified:
mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
Modified: mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext?rev=1669950&r1=1669949&r2=1669950&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext Sun Mar 29 18:54:00 2015
@@ -13,10 +13,14 @@ The Mahout implementation uses Stochasti
large training sets to be used.
For a more detailed analysis of the approach, have a look at the [thesis of
-Paul Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en).
+Paul Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en) [1].
See MAHOUT-228 for the main JIRA issue for SGD.
+A more detailed overview of the Mahout Linear Regression classifier and [detailed discription of building a Logistic Regression classifier](http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/) for the classic [Iris flower dataset](http://en.wikipedia.org/wiki/Iris_flower_data_set) is also available [2].
+
+An example of using training a Logistic Regression classifier for the [UCI Bank Marketing Dataset](http://mlr.cs.umass.edu/ml/datasets/Bank+Marketing) can be found [on the Mahout website](http://mahout.apache.org/users/classification/bankmarketing-example.html) [3].
+
<a name="LogisticRegression-Parallelizationstrategy"></a>
## Parallelization strategy
@@ -53,7 +57,7 @@ include
* The evolutionary optimization system (found in org.apache.mahout.ep)
<a name="LogisticRegression-Featurevectorencoding"></a>
-### Feature vector encoding
+## Feature vector encoding
Because the SGD algorithms need to have fixed length feature vectors and
because it is a pain to build a dictionary ahead of time, most SGD
@@ -78,7 +82,7 @@ Here is a class diagram for the encoders
![class diagram](../../images/vector-class-hierarchy.png)
<a name="LogisticRegression-SGDLearning"></a>
-### SGD Learning
+## SGD Learning
For the simplest applications, you can construct an
OnlineLogisticRegression and be off and running. Typically, though, it is
@@ -104,3 +108,15 @@ TrainNewsGroups example code.
![sgd class diagram](../../images/sgd-class-hierarchy.png)
+## References
+
+[1] [Thesis of
+Paul Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en)
+
+[2] [An Introduction To Mahout's Logistic Regression SGD Classifier](http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/)
+
+## Examples
+
+[3] [SGD Bank Marketing Example](http://mahout.apache.org/users/classification/bankmarketing-example.html)
+
+