You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2015/03/29 20:54:05 UTC

svn commit: r945539 - in /websites/staging/mahout/trunk/content: ./ users/classification/logistic-regression.html

Author: buildbot
Date: Sun Mar 29 18:54:05 2015
New Revision: 945539

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/classification/logistic-regression.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Mar 29 18:54:05 2015
@@ -1 +1 @@
-1669854
+1669950

Modified: websites/staging/mahout/trunk/content/users/classification/logistic-regression.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/classification/logistic-regression.html (original)
+++ websites/staging/mahout/trunk/content/users/classification/logistic-regression.html Sun Mar 29 18:54:05 2015
@@ -261,8 +261,10 @@ production fraud detection and advertisi
 The Mahout implementation uses Stochastic Gradient Descent (SGD) to all
 large training sets to be used.</p>
 <p>For a more detailed analysis of the approach, have a look at the <a href="http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&amp;language=en">thesis of
-Paul Komarek</a>.</p>
+Paul Komarek</a> [1].</p>
 <p>See MAHOUT-228 for the main JIRA issue for SGD.</p>
+<p>A more detailed overview of the Mahout Linear Regression classifier and <a href="http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/">detailed discription of building a Logistic Regression classifier</a> for the classic <a href="http://en.wikipedia.org/wiki/Iris_flower_data_set">Iris flower dataset</a> is also available [2]. </p>
+<p>An example of using training a Logistic Regression classifier for the <a href="http://mlr.cs.umass.edu/ml/datasets/Bank+Marketing">UCI Bank Marketing Dataset</a> can be found <a href="http://mahout.apache.org/users/classification/bankmarketing-example.html">on the Mahout website</a> [3].</p>
 <p><a name="LogisticRegression-Parallelizationstrategy"></a></p>
 <h2 id="parallelization-strategy">Parallelization strategy</h2>
 <p>The bad news is that SGD is an inherently sequential algorithm.  The good
@@ -298,7 +300,7 @@ include</p>
 </li>
 </ul>
 <p><a name="LogisticRegression-Featurevectorencoding"></a></p>
-<h3 id="feature-vector-encoding">Feature vector encoding</h3>
+<h2 id="feature-vector-encoding">Feature vector encoding</h2>
 <p>Because the SGD algorithms need to have fixed length feature vectors and
 because it is a pain to build a dictionary ahead of time, most SGD
 applications use the hashed feature vector encoding system that is rooted
@@ -317,7 +319,7 @@ case you are getting your training data
 <p>Here is a class diagram for the encoders package:</p>
 <p><img alt="class diagram" src="../../images/vector-class-hierarchy.png" /></p>
 <p><a name="LogisticRegression-SGDLearning"></a></p>
-<h3 id="sgd-learning">SGD Learning</h3>
+<h2 id="sgd-learning">SGD Learning</h2>
 <p>For the simplest applications, you can construct an
 OnlineLogisticRegression and be off and running.  Typically, though, it is
 nice to have running estimates of performance on held out data.  To do
@@ -338,6 +340,12 @@ so that you don't have to.</p>
 the number of twiddlable knobs is pretty large.  For some examples, see the
 TrainNewsGroups example code.</p>
 <p><img alt="sgd class diagram" src="../../images/sgd-class-hierarchy.png" /></p>
+<h2 id="references">References</h2>
+<p>[1] <a href="http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&amp;language=en">Thesis of
+Paul Komarek</a></p>
+<p>[2] <a href="http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/">An Introduction To Mahout's Logistic Regression SGD Classifier</a></p>
+<h2 id="examples">Examples</h2>
+<p>[3] <a href="http://mahout.apache.org/users/classification/bankmarketing-example.html">SGD Bank Marketing Example</a></p>
    </div>
   </div>     
 </div>