You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2015/03/29 20:54:05 UTC
svn commit: r945539 - in /websites/staging/mahout/trunk/content: ./
users/classification/logistic-regression.html
Author: buildbot
Date: Sun Mar 29 18:54:05 2015
New Revision: 945539
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/classification/logistic-regression.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Mar 29 18:54:05 2015
@@ -1 +1 @@
-1669854
+1669950
Modified: websites/staging/mahout/trunk/content/users/classification/logistic-regression.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/classification/logistic-regression.html (original)
+++ websites/staging/mahout/trunk/content/users/classification/logistic-regression.html Sun Mar 29 18:54:05 2015
@@ -261,8 +261,10 @@ production fraud detection and advertisi
The Mahout implementation uses Stochastic Gradient Descent (SGD) to all
large training sets to be used.</p>
<p>For a more detailed analysis of the approach, have a look at the <a href="http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en">thesis of
-Paul Komarek</a>.</p>
+Paul Komarek</a> [1].</p>
<p>See MAHOUT-228 for the main JIRA issue for SGD.</p>
+<p>A more detailed overview of the Mahout Linear Regression classifier and <a href="http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/">detailed discription of building a Logistic Regression classifier</a> for the classic <a href="http://en.wikipedia.org/wiki/Iris_flower_data_set">Iris flower dataset</a> is also available [2]. </p>
+<p>An example of using training a Logistic Regression classifier for the <a href="http://mlr.cs.umass.edu/ml/datasets/Bank+Marketing">UCI Bank Marketing Dataset</a> can be found <a href="http://mahout.apache.org/users/classification/bankmarketing-example.html">on the Mahout website</a> [3].</p>
<p><a name="LogisticRegression-Parallelizationstrategy"></a></p>
<h2 id="parallelization-strategy">Parallelization strategy</h2>
<p>The bad news is that SGD is an inherently sequential algorithm. The good
@@ -298,7 +300,7 @@ include</p>
</li>
</ul>
<p><a name="LogisticRegression-Featurevectorencoding"></a></p>
-<h3 id="feature-vector-encoding">Feature vector encoding</h3>
+<h2 id="feature-vector-encoding">Feature vector encoding</h2>
<p>Because the SGD algorithms need to have fixed length feature vectors and
because it is a pain to build a dictionary ahead of time, most SGD
applications use the hashed feature vector encoding system that is rooted
@@ -317,7 +319,7 @@ case you are getting your training data
<p>Here is a class diagram for the encoders package:</p>
<p><img alt="class diagram" src="../../images/vector-class-hierarchy.png" /></p>
<p><a name="LogisticRegression-SGDLearning"></a></p>
-<h3 id="sgd-learning">SGD Learning</h3>
+<h2 id="sgd-learning">SGD Learning</h2>
<p>For the simplest applications, you can construct an
OnlineLogisticRegression and be off and running. Typically, though, it is
nice to have running estimates of performance on held out data. To do
@@ -338,6 +340,12 @@ so that you don't have to.</p>
the number of twiddlable knobs is pretty large. For some examples, see the
TrainNewsGroups example code.</p>
<p><img alt="sgd class diagram" src="../../images/sgd-class-hierarchy.png" /></p>
+<h2 id="references">References</h2>
+<p>[1] <a href="http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=main&language=en">Thesis of
+Paul Komarek</a></p>
+<p>[2] <a href="http://blog.trifork.com/2014/02/04/an-introduction-to-mahouts-logistic-regression-sgd-classifier/">An Introduction To Mahout's Logistic Regression SGD Classifier</a></p>
+<h2 id="examples">Examples</h2>
+<p>[3] <a href="http://mahout.apache.org/users/classification/bankmarketing-example.html">SGD Bank Marketing Example</a></p>
</div>
</div>
</div>