You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2014/03/07 14:11:07 UTC

mahout command

Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Posted by Mahmood Naderan <nt...@yahoo.com>.
What a fast reply... Thanks a lot Suneel,

 
Regards,
Mahmood



On Saturday, March 8, 2014 11:29 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
You can ignore the warnings. 





On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Oh yes... Thanks Andrew you are right
Meanwhile I see two
 warnings

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Is there any concern about them?


 
R.egards,
Mahmood



On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0





On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:

You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter
 =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> No success Suneel...
>
> Please see the attachment which is the output of
>      mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o
 wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the
 item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
>
 collaborative filtering
>   regexconverter: : Convert text files on a per line basis based on
> regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence
 files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>   seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value
 Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi
 decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
>   On Saturday, March 8, 2014 9:56
 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>  Not sure
 what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
>   On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
>   On Friday, March 7, 2014 8:09
 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>  The example as documented on the Wiki should work.  The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
>   On Friday, March 7, 2014 11:16 AM, Mahmood
 Naderan <nt...@yahoo.com>
> wrote:
>  FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact,  see this file
>    
 src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
>
 org.apache.mahout.utils.SequenceFileDumper =
 seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per
> line basis based on regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
>
 seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>  
 transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
>     mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this
 error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native
> Method)
>     at
 java.lang.Class.forName(Class.java:186)
>     at
>
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at
> java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
>    export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: mahout command

Posted by Suneel Marthi <su...@yahoo.com>.
You can ignore the warnings. 





On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Oh yes... Thanks Andrew you are right
Meanwhile I see two warnings

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Is there any concern about them?


 
R.egards,
Mahmood



On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0





On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:

You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> No success Suneel...
>
> Please see the attachment which is the output of
>      mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the
 item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
>
 collaborative filtering
>   regexconverter: : Convert text files on a per line basis based on
> regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence
 files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>   seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value
 Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi
 decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
>   On Saturday, March 8, 2014 9:56
 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>  Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
>   On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
>   On Friday, March 7, 2014 8:09
 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>  The example as documented on the Wiki should work.  The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
>   On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact,  see this file
>    
 src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper =
 seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per
> line basis based on regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
>
 seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>  
 transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
>     mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native
> Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
>
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at
> java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
>    export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: mahout command

Posted by Mahmood Naderan <nt...@yahoo.com>.
Oh yes... Thanks Andrew you are right
Meanwhile I see two warnings

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Is there any concern about them?


 
R.egards,
Mahmood



On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0





On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:

You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> No success Suneel...
>
> Please see the attachment which is the output of
>      mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per line basis based on
> regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>   seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
>   On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>  Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
>   On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
>   On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>  The example as documented on the Wiki should work.  The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
>   On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact,  see this file
>     src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per
> line basis based on regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
> seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
>     mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native
> Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at
> java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
>    export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: mahout command

Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0




On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:
 
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> No success Suneel...
>
> Please see the attachment which is the output of
>      mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per line basis based on
> regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>   seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
>   On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>  Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
>   On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
>   On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>  The example as documented on the Wiki should work.  The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
>   On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact,  see this file
>     src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per
> line basis based on regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
> seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
>     mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native
> Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at
> java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
>    export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: mahout command

Posted by Andrew Musselman <an...@gmail.com>.
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64


On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> No success Suneel...
>
> Please see the attachment which is the output of
>      mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per line basis based on
> regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>   seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
>   On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>  Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
>   On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
>   On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>  The example as documented on the Wiki should work.  The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
>   On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>  FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact,  see this file
>     src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per
> line basis based on regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
> seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
>     mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native
> Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at
> java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
>    export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: mahout command

Posted by Mahmood Naderan <nt...@yahoo.com>.
No success Suneel...

Please see the attachment which is the output of 
     mvn clean package -Dhadoop2.version=2.3.0

Additionally:

 
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper


hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump: : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
  seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
  vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 




Regards,
Mahmood



On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <su...@yahoo.com> wrote:
 

mvn clean package -Dhadoop2.version=2.3.0

please give that a try.




On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 

>mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>

Excuse me, if I have 2.3.0 which command is correct
mvn clean package -Dhadoop2.3.0.=2.3.0
mvn clean package -Dhadoop2.version=2.3.0
 
Regards,
Mahmood



On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
Not sure what's so disappointing here, it was never officially announced that Mahout 0.9 had Hadoop 2.x support.

From trunk, can you build mahout for hadoop2 using this command:

mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>



On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
That is rather disappointing.... 

>b) Work off of present Head and build with Hadoop 2.x profile. 
Can you explain more? 


 
Regards,
Mahmood



On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
The example as documented on the Wiki should work.  The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.

Suggest that you either:

a) Switch to a
 Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile. 

Mahout 0.9 is not certified for Hadoop 2.x.






On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

 
Regards,
Mahmood




On Friday,
 March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump :
 Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump:
 : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate
 Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per
line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to
 {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
 
seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
 
 vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 





 
Regards,
Mahmood



On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:

Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.








On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:

Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at
 java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at
java.lang.reflect.Method.invoke(Method.java:601)
    at
 org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Posted by Suneel Marthi <su...@yahoo.com>.
Not sure what's so disappointing here, it was never officially announced that Mahout 0.9 had Hadoop 2.x support.

From trunk, can you build mahout for hadoop2 using this command:

mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>



On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
That is rather disappointing.... 

>b) Work off of present Head and build with Hadoop 2.x profile. 
Can you explain more? 


 
Regards,
Mahmood



On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
The example as documented on the Wiki should work.  The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.

Suggest that you either:

a) Switch to a
 Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile. 

Mahout 0.9 is not certified for Hadoop 2.x.






On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

 
Regards,
Mahmood




On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump :
 Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump:
 : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate
 Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per
line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to
 {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
 
seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
 
 vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 





 
Regards,
Mahmood



On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:

Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.








On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:

Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at
 java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at
java.lang.reflect.Method.invoke(Method.java:601)
    at
 org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Posted by Mahmood Naderan <nt...@yahoo.com>.
That is rather disappointing.... 

>b) Work off of present Head and build with Hadoop 2.x profile. 
Can you explain more? 


 
Regards,
Mahmood



On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
The example as documented on the Wiki should work.  The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.

Suggest that you either:

a) Switch to a Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile. 

Mahout 0.9 is not certified for Hadoop 2.x.






On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

 
Regards,
Mahmood




On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump:
 : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate
 Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per
line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to
 {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
 
seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
 
 vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 





 
Regards,
Mahmood



On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:

Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.








On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:

Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at
 java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at
java.lang.reflect.Method.invoke(Method.java:601)
    at
 org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Posted by Suneel Marthi <su...@yahoo.com>.
The example as documented on the Wiki should work.  The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.

Suggest that you either:

a) Switch to a Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile. 

Mahout 0.9 is not certified for Hadoop 2.x.






On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

 
Regards,
Mahmood




On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:

In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump:
 : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate
 Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per
line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to
 {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
 
seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
 
 vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 





 
Regards,
Mahmood



On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:

Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.








On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:

Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at
 java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at
java.lang.reflect.Method.invoke(Method.java:601)
    at
 org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Posted by Mahmood Naderan <nt...@yahoo.com>.
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

 
Regards,
Mahmood



On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to
 text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
 wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump: : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE
 and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per
 line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
 
 seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
  vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
 Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 





 
Regards,
Mahmood



On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.








On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:

Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at
 java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Posted by Mahmood Naderan <nt...@yahoo.com>.
In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump: : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative filtering
  regexconverter: : Convert text files on a per line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
  seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
  vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 





 
Regards,
Mahmood



On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
 
Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.








On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:

Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Posted by Suneel Marthi <su...@yahoo.com>.
Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.







On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
 
Hi
When I run 

    mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
 Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
 mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like 

   export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood