You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2014/03/07 14:11:07 UTC
mahout command
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood
Re: mahout command
Posted by Mahmood Naderan <nt...@yahoo.com>.
What a fast reply... Thanks a lot Suneel,
Regards,
Mahmood
On Saturday, March 8, 2014 11:29 PM, Suneel Marthi <su...@yahoo.com> wrote:
You can ignore the warnings.
On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Oh yes... Thanks Andrew you are right
Meanwhile I see two
warnings
WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Is there any concern about them?
R.egards,
Mahmood
On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <su...@yahoo.com> wrote:
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0
On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.
Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter
=
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:
> No success Suneel...
>
> Please see the attachment which is the output of
> mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o
wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the
item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
>
collaborative filtering
> regexconverter: : Convert text files on a per line basis based on
> regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence
files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value
Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi
decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
> On Saturday, March 8, 2014 9:56
AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
> Not sure
what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
> On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
> On Friday, March 7, 2014 8:09
PM, Suneel Marthi <su...@yahoo.com>
> wrote:
> The example as documented on the Wiki should work. The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
> On Friday, March 7, 2014 11:16 AM, Mahmood
Naderan <nt...@yahoo.com>
> wrote:
> FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact, see this file
>
src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
>
org.apache.mahout.utils.SequenceFileDumper =
seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per
> line basis based on regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
>
seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
>
transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
> mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this
error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native
> Method)
> at
java.lang.Class.forName(Class.java:186)
> at
>
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at
> java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
> export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>
Re: mahout command
Posted by Suneel Marthi <su...@yahoo.com>.
You can ignore the warnings.
On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Oh yes... Thanks Andrew you are right
Meanwhile I see two warnings
WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Is there any concern about them?
R.egards,
Mahmood
On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <su...@yahoo.com> wrote:
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0
On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.
Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:
> No success Suneel...
>
> Please see the attachment which is the output of
> mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the
item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
>
collaborative filtering
> regexconverter: : Convert text files on a per line basis based on
> regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence
files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value
Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi
decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
> On Saturday, March 8, 2014 9:56
AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
> Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
> On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
> On Friday, March 7, 2014 8:09
PM, Suneel Marthi <su...@yahoo.com>
> wrote:
> The example as documented on the Wiki should work. The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
> On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact, see this file
>
src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper =
seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per
> line basis based on regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
>
seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
>
transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
> mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native
> Method)
> at java.lang.Class.forName(Class.java:186)
> at
>
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at
> java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
> export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>
Re: mahout command
Posted by Mahmood Naderan <nt...@yahoo.com>.
Oh yes... Thanks Andrew you are right
Meanwhile I see two warnings
WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Is there any concern about them?
R.egards,
Mahmood
On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <su...@yahoo.com> wrote:
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0
On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.
Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:
> No success Suneel...
>
> Please see the attachment which is the output of
> mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per line basis based on
> regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
> On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
> Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
> On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
> On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
> The example as documented on the Wiki should work. The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
> On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact, see this file
> src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per
> line basis based on regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
> mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native
> Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at
> java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
> export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>
Re: mahout command
Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0
On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <an...@gmail.com> wrote:
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.
Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:
> No success Suneel...
>
> Please see the attachment which is the output of
> mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per line basis based on
> regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
> On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
> Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
> On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
> On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
> The example as documented on the Wiki should work. The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
> On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact, see this file
> src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per
> line basis based on regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
> mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native
> Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at
> java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
> export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>
Re: mahout command
Posted by Andrew Musselman <an...@gmail.com>.
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.
Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt...@yahoo.com>wrote:
> No success Suneel...
>
> Please see the attachment which is the output of
> mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per line basis based on
> regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
> On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <
> nt_mahmood@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
> On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_marthi@yahoo.com> wrote:
> Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
> On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
> On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
> The example as documented on the Wiki should work. The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
> On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
> FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> In fact, see this file
> src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
> arff.vector: : Generate Vectors from an ARFF file or directory
> baumwelch: : Baum-Welch algorithm for unsupervised HMM training
> canopy: : Canopy clustering
> cat: : Print a file or resource as the logistic regression models would
> see it
> cleansvd: : Cleanup and verification of SVD output
> clusterdump: : Dump cluster output to text
> clusterpp: : Groups Clustering Output In Clusters
> cmdump: : Dump confusion matrix in HTML or text formats
> concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
> cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
> cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
> evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
> fkmeans: : Fuzzy K-means clustering
> hmmpredict: : Generate random sequence of observations by given HMM
> itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
> kmeans: : K-means clustering
> lucene.vector: : Generate Vectors from a Lucene index
> lucene2seq: : Generate Text SequenceFiles from a Lucene index
> matrixdump: : Dump matrix in CSV format
> matrixmult: : Take the product of two matrices
> parallelALS: : ALS-WR factorization of a rating matrix
> qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
> recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
> recommenditembased: : Compute recommendations using item-based
> collaborative filtering
> regexconverter: : Convert text files on a per
> line basis based on regular expressions
> resplit: : Splits a set of SequenceFiles into a number of equal splits
> rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
> rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
> runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
> runlogistic: : Run a logistic regression model against CSV data
> seq2encoded: : Encoded Sparse Vector generation from Text sequence files
> seq2sparse: : Sparse Vector generation from Text sequence files
> seqdirectory: : Generate sequence files (of Text) from a directory
> seqdumper: : Generic Sequence File dumper
> seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
> seqwiki: : Wikipedia xml dump to sequence file
> spectralkmeans: : Spectral k-means clustering
> split: : Split Input data into test and train sets
> splitDataset: : split a rating dataset into training and probe parts
> ssvd: : Stochastic SVD
> streamingkmeans: : Streaming k-means clustering
> svd: : Lanczos Singular Value Decomposition
> testnb: : Test the Vector-based Bayes classifier
> trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
> trainlogistic: : Train a logistic regression using stochastic gradient
> descent
> trainnb: : Train the Vector-based Bayes classifier
> transpose: : Take the transpose of a matrix
> validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
> vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
> vectordump: : Dump vectors from a sequence file to text
> viterbi: : Viterbi decoding of hidden states from given output states
> sequence
> wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
> mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native
> Method)
> at java.lang.Class.forName(Class.java:186)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at
> java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
> export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>
Re: mahout command
Posted by Mahmood Naderan <nt...@yahoo.com>.
No success Suneel...
Please see the attachment which is the output of
mvn clean package -Dhadoop2.version=2.3.0
Additionally:
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$
Regards,
Mahmood
On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <su...@yahoo.com> wrote:
mvn clean package -Dhadoop2.version=2.3.0
please give that a try.
On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
>mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
Excuse me, if I have 2.3.0 which command is correct
mvn clean package -Dhadoop2.3.0.=2.3.0
mvn clean package -Dhadoop2.version=2.3.0
Regards,
Mahmood
On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <su...@yahoo.com> wrote:
Not sure what's so disappointing here, it was never officially announced that Mahout 0.9 had Hadoop 2.x support.
From trunk, can you build mahout for hadoop2 using this command:
mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
That is rather disappointing....
>b) Work off of present Head and build with Hadoop 2.x profile.
Can you explain more?
Regards,
Mahmood
On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com> wrote:
The example as documented on the Wiki should work. The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.
Suggest that you either:
a) Switch to a
Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile.
Mahout 0.9 is not certified for Hadoop 2.x.
On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Regards,
Mahmood
On Friday,
March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
In fact, see this file
src/conf/driver.classes.default.props
which is not exactly as what you said. Still I have the same problem. Please see the complete log
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump :
Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump:
: Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate
Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per
line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to
{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$
Regards,
Mahmood
On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
Mehmood,
wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
You should then be able to invoke via:
mahout wikipediaXmlSplitter -d<path> -o<path> -c64
please give that a try.
On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native
Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at
java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood
Re: mahout command
Posted by Suneel Marthi <su...@yahoo.com>.
Not sure what's so disappointing here, it was never officially announced that Mahout 0.9 had Hadoop 2.x support.
From trunk, can you build mahout for hadoop2 using this command:
mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
That is rather disappointing....
>b) Work off of present Head and build with Hadoop 2.x profile.
Can you explain more?
Regards,
Mahmood
On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com> wrote:
The example as documented on the Wiki should work. The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.
Suggest that you either:
a) Switch to a
Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile.
Mahout 0.9 is not certified for Hadoop 2.x.
On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Regards,
Mahmood
On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
In fact, see this file
src/conf/driver.classes.default.props
which is not exactly as what you said. Still I have the same problem. Please see the complete log
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump :
Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump:
: Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate
Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per
line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to
{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$
Regards,
Mahmood
On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
Mehmood,
wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
You should then be able to invoke via:
mahout wikipediaXmlSplitter -d<path> -o<path> -c64
please give that a try.
On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native
Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at
java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood
Re: mahout command
Posted by Mahmood Naderan <nt...@yahoo.com>.
That is rather disappointing....
>b) Work off of present Head and build with Hadoop 2.x profile.
Can you explain more?
Regards,
Mahmood
On Friday, March 7, 2014 8:09 PM, Suneel Marthi <su...@yahoo.com> wrote:
The example as documented on the Wiki should work. The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.
Suggest that you either:
a) Switch to a Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile.
Mahout 0.9 is not certified for Hadoop 2.x.
On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Regards,
Mahmood
On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
In fact, see this file
src/conf/driver.classes.default.props
which is not exactly as what you said. Still I have the same problem. Please see the complete log
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump:
: Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate
Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per
line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to
{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$
Regards,
Mahmood
On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
Mehmood,
wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
You should then be able to invoke via:
mahout wikipediaXmlSplitter -d<path> -o<path> -c64
please give that a try.
On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native
Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at
java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood
Re: mahout command
Posted by Suneel Marthi <su...@yahoo.com>.
The example as documented on the Wiki should work. The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work.
Suggest that you either:
a) Switch to a Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile.
Mahout 0.9 is not certified for Hadoop 2.x.
On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Regards,
Mahmood
On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
In fact, see this file
src/conf/driver.classes.default.props
which is not exactly as what you said. Still I have the same problem. Please see the complete log
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump:
: Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate
Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per
line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to
{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$
Regards,
Mahmood
On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
Mehmood,
wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
You should then be able to invoke via:
mahout wikipediaXmlSplitter -d<path> -o<path> -c64
please give that a try.
On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native
Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at
java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood
Re: mahout command
Posted by Mahmood Naderan <nt...@yahoo.com>.
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Regards,
Mahmood
On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
In fact, see this file
src/conf/driver.classes.default.props
which is not exactly as what you said. Still I have the same problem. Please see the complete log
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per
line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$
Regards,
Mahmood
On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
Mehmood,
wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
You should then be able to invoke via:
mahout wikipediaXmlSplitter -d<path> -o<path> -c64
please give that a try.
On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native
Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at
java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood
Re: mahout command
Posted by Mahmood Naderan <nt...@yahoo.com>.
In fact, see this file
src/conf/driver.classes.default.props
which is not exactly as what you said. Still I have the same problem. Please see the complete log
hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper
hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence
wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$
Regards,
Mahmood
On Friday, March 7, 2014 5:02 PM, Suneel Marthi <su...@yahoo.com> wrote:
Mehmood,
wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
You should then be able to invoke via:
mahout wikipediaXmlSplitter -d<path> -o<path> -c64
please give that a try.
On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native
Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood
Re: mahout command
Posted by Suneel Marthi <su...@yahoo.com>.
Mehmood,
wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter
You should then be able to invoke via:
mahout wikipediaXmlSplitter -d<path> -o<path> -c64
please give that a try.
On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi
When I run
mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native
Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.
However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like
export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
Where should I add that?
Regards,
Mahmood