You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ap...@apache.org on 2015/04/05 22:14:59 UTC

svn commit: r1671423 - /mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-classifier-example.mdtext

Author: apalumbo
Date: Sun Apr  5 20:14:58 2015
New Revision: 1671423

URL: http://svn.apache.org/r1671423
Log:
MAHOUT-1559 add documentation for the Wikipedia classification example

Modified:
    mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-classifier-example.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-classifier-example.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-classifier-example.mdtext?rev=1671423&r1=1671422&r2=1671423&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-classifier-example.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-classifier-example.mdtext Sun Apr  5 20:14:58 2015
@@ -7,6 +7,8 @@ You can run this script to build and tes
 
 ## Oververview
 
+Tou run the example simply execute the `$MAHOUT_HOME/examples/bin/classify-wikipedia.sh` script.
+
 By defult the script is set to run on a medium sized Wikipedia XML dump.  To run on the full set (the entire english Wikipedia) you can change the download by commenting out line 78, and uncommenting line 80  of [classify-wikipedia.sh](https://github.com/apache/mahout/blob/master/examples/bin/classify-wikipedia.sh) [1]. However this is not recommended unless you have the resources to do so. *Be sure to clean your work directory when changing datasets- option (3).*
 
 The step by step process for Creating a Naive Bayes Classifier for the Wikipedia XML dump is very similar to that for [creating a 20 Newsgroups Classifier](http://mahout.apache.org/users/classification/twenty-newsgroups.html) [4].  The only difference being that instead of running `$mahout seqdirectory` on the unzipped 20 Newsgroups file, you'll run `$mahout seqwiki` on the unzipped Wikipedia xml dump.