You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by co...@apache.org on 2011/10/02 15:17:00 UTC
[CONF] Apache Mahout > Creating Vectors from Weka's ARFF Format
Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Creating Vectors from Weka's ARFF Format (https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka%27s+ARFF+Format)
Edited by Joe Prasanna Kumar:
---------------------------------------------------------------------
h1. Introduction
Mahout now has capabilities for converting Weka's [ARFF|http://www.cs.waikato.ac.nz/~ml/weka/arff.html] (2.1) format to Mahout's Vector format.
h1. Running the Converter
ARFF files are easily converted using the org.apache.mahout.utils.arff.Driver program. The input arguments can be found by running it with the \--help argument which produces results similar to:
{noformat}
Usage:
[--input <input> --output <output> --max <max> --help --dictOut <dictOut>
--outputWriter <outputWriter> --delimiter <delimiter>]
Options
--input (-d) input The file or directory containing the ARFF
files. If it is a directory, all .arff
files will be converted. (Mandatory parameter)
--output (-o) output The output directory. Files will have
the same name as the input, but with the
extension .mvc (Mandatory parameter)
--max (-m) max The maximum number of vectors to output.
If not specified, then it will loop over
all docs (Optional parameter)
--help (-h) Print out help (Optional parameter)
--dictOut (-t) dictOut The file to output the label bindings
(Mandatory parameter)
--outputWriter (-e) outputWriter The VectorWriter to use, either seq
(SequenceFileVectorWriter - default) or
file (Writes to a File using JSON format)
(Optional parameter)
--delimiter (-l) delimiter The delimiter for outputing the
dictionary (Optional parameter)
{noformat}
You can use the parameters in its long format like \--input or using the equivalent short name \-d. From here, running the Driver is as simple as pointing it at the ARFF file:
{noformat}
$MAHOUT_HOME/bin/mahout arff.vector -d ./content/reuters-modapte/ \
-t ./content/reuters-modapte/output/dict.txt -o ./content/reuters-modapte/output/convert
{noformat}
Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action