You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ap...@apache.org on 2015/03/29 20:54:01 UTC

svn commit: r1669950 - /mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext

Author: apalumbo
Date: Sun Mar 29 18:54:00 2015
New Revision: 1669950

URL: http://svn.apache.org/r1669950
Log:
add references to bank marketing dataset and Frank's blog on SGD page.

Modified:
    mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext?rev=1669950&r1=1669949&r2=1669950&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/classification/logistic-regression.mdtext Sun Mar 29 18:54:00 2015
@@ -13,10 +13,14 @@ The Mahout implementation uses Stochasti
 large training sets to be used.
 
 For a more detailed analysis of the approach, have a look at the [thesis of
-Paul Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en).
+Paul Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en) [1].
 
 See MAHOUT-228 for the main JIRA issue for SGD.
 
+A more detailed overview of the Mahout Linear Regression classifier and [detailed discription of building a Logistic Regression classifier](http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/) for the classic [Iris flower dataset](http://en.wikipedia.org/wiki/Iris_flower_data_set) is also available [2]. 
+
+An example of using training a Logistic Regression classifier for the [UCI Bank Marketing Dataset](http://mlr.cs.umass.edu/ml/datasets/Bank+Marketing) can be found [on the Mahout website](http://mahout.apache.org/users/classification/bankmarketing-example.html) [3].
+
 
 <a name="LogisticRegression-Parallelizationstrategy"></a>
 ## Parallelization strategy
@@ -53,7 +57,7 @@ include
 * The evolutionary optimization system (found in org.apache.mahout.ep)
 
 <a name="LogisticRegression-Featurevectorencoding"></a>
-### Feature vector encoding
+## Feature vector encoding
 
 Because the SGD algorithms need to have fixed length feature vectors and
 because it is a pain to build a dictionary ahead of time, most SGD
@@ -78,7 +82,7 @@ Here is a class diagram for the encoders
 ![class diagram](../../images/vector-class-hierarchy.png)
 
 <a name="LogisticRegression-SGDLearning"></a>
-### SGD Learning
+## SGD Learning
 
 For the simplest applications, you can construct an
 OnlineLogisticRegression and be off and running.  Typically, though, it is
@@ -104,3 +108,15 @@ TrainNewsGroups example code.
 
 ![sgd class diagram](../../images/sgd-class-hierarchy.png)
 
+## References
+
+[1] [Thesis of
+Paul Komarek](http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en)
+
+[2] [An Introduction To Mahout's Logistic Regression SGD Classifier](http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/)
+
+## Examples
+
+[3] [SGD Bank Marketing Example](http://mahout.apache.org/users/classification/bankmarketing-example.html)
+
+