You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by is...@apache.org on 2013/11/20 21:14:16 UTC

svn commit: r1543926 - /mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-bayes-example.mdtext

Author: isabel
Date: Wed Nov 20 20:14:16 2013
New Revision: 1543926

URL: http://svn.apache.org/r1543926
Log:
MAHOUT-1245 - fix formatting of naive bayes wikipedia example

Modified:
    mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-bayes-example.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-bayes-example.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-bayes-example.mdtext?rev=1543926&r1=1543925&r2=1543926&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-bayes-example.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/classification/wikipedia-bayes-example.mdtext Wed Nov 20 20:14:16 2013
@@ -1,4 +1,7 @@
 Title: Wikipedia Bayes Example
+
+# Naive Bayes Wikipedia Example
+
 <a name="WikipediaBayesExample-Intro"></a>
 # Intro
 
@@ -13,28 +16,35 @@ what country an unseen article should be
 <a name="WikipediaBayesExample-Runningtheexample"></a>
 # Running the example
 
-1. download the wikipedia data set [here ](-http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2.html)
-1. unzip the bz2 file to get the enwiki-latest-pages-articles.xml. 
-1. Create directory $MAHOUT_HOME/examples/temp and copy the xml file into
-this directory
-1. Chunk the Data into pieces: {code}$MAHOUT_HOME/bin/mahout
-wikipediaXMLSplitter -d
-$MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles10.xml -o
-wikipedia/chunks -c 64{code} {quote}*We strongly suggest you backup the
-results to some other place so that you don't have to do this step again in
-case it gets accidentally erased*{quote}
+1. Download the wikipedia data set [here](http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2.html).
+
+1. Unzip the bz2 file to get the enwiki-latest-pages-articles.xml. 
+
+1. Create directory `$MAHOUT_HOME/examples/temp` and copy the xml file into this directory
+
+1. Chunk the Data into pieces: `$MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles10.xml -o wikipedia/chunks -c 64` 
+*We strongly suggest you backup the results to some other place so that you don't have to do this step again in
+case it gets accidentally erased.*
+
 1. This would have created the chunks in HDFS. Verify the same by executing
-{code}hadoop fs -ls wikipedia/chunks{code} and it'll list all the xml files
-as chunk-0001.xml and so on.
+`hadoop fs -ls wikipedia/chunks` and it'll list all the xml files as chunk-0001.xml and so on.
+
 1. Create the countries based Split of wikipedia dataset.
-{code}$MAHOUT_HOME/bin/mahout  wikipediaDataSetCreator	-i wikipedia/chunks
--o wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt
+`$MAHOUT_HOME/bin/mahout  wikipediaDataSetCreator	-i wikipedia/chunks
+-o wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt`.
 
-    # Verify the creation of input data set by executing {code} hadoop fs -ls
-wikipediainput {code} and you'll be able to see part-r-00000 file inside
+
+<br><br>
+
+After input preparation start the actual training:
+
+
+* Verify the creation of input data set by executing `hadoop fs -ls wikipediainput` and you'll be able to see part-r-00000 file inside
 wikipediainput directory
-    # Train the classifier: {code}$MAHOUT_HOME/bin/mahout trainclassifier -i
-wikipediainput -o wikipediamodel{code}. The model file will be available in
+
+* Train the classifier: `$MAHOUT_HOME/bin/mahout trainclassifier -i
+wikipediainput -o wikipediamodel`. The model file will be available in
 the wikipediamodel folder in HDFS.
-    # Test the classifier: {code}$MAHOUT_HOME/bin/mahout testclassifier -m
-wikipediamodel -d wikipediainput{code}
+
+* Test the classifier: `$MAHOUT_HOME/bin/mahout testclassifier -m
+wikipediamodel -d wikipediainput`