You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by da...@ontrenet.com on 2011/03/07 17:57:03 UTC
Complete canopy example?
Hi,
I have a directory of text documents I want to do canopy clustering with
(mahout 0.4 standalone/no hadoop).
I'm having some difficulty doing this. Is there a complete example with
every step?
Here is what I do:
Step 1$ ./bin/mahout seqdirectory -i INPUT_FILES/ -o FEED_SEQ -c UTF-8
-chunk 5
# My INPUT_FILES contains 1000 text files, yet the output FEED_SEQ
contains only 1 tiny chunk with a file in it. Is that right?
Step 2$ ./bin/mahout seq2sparse -i FEED_SEQ -o FEED_VEC --maxNGramSize 3
# This seems to generate a bit of output. no errors
Step 3$ ./bin/mahout canopy -i FEED_VEC -o FEED_CENTS -t1 1500 -t2 2000
Exception in thread "main" java.io.FileNotFoundException: File
file:/home/darren/Downloads/mahout-distribution-0.4/FEED_VEC/tokenized-documents/data
does not exist.
----
Step 1 output is suspicious to me:
$ ./bin/mahout seqdirectory -i INPUT_FILES/ -o FEED_SEQ -c UTF-8 -chunk 5
no HADOOP_HOME set, running locally
Mar 7, 2011 11:57:14 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 847 ms
----
Darren