You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Shrikar archak <sh...@gmail.com> on 2013/01/04 22:12:33 UTC

Seqdirectory not extracting text with mahout 0.7

Hi All,
I am trying to convert the reuters text documents into seqfiles, the
command doesn't return error
but if I try to dump the files I get no keys in the seqfiles.

Shrikars-MacBook-Pro:mahout-distribution-0.7 shrikar$ *bin/mahout
seqdirectory -c UTF-8 -i examples/reuters-extracted/ -o reuters-seqfiles*
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /Users/shrikar/Installed/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/Users/shrikar/Installed/hadoop/conf
MAHOUT-JOB:
/Users/shrikar/Installed/mahout-distribution-0.7/examples/target/mahout-examples-0.7-job.jar
13/01/04 13:06:01 INFO common.AbstractJob: Command line arguments:
{--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],
--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],
--input=[examples/reuters-extracted/], --keyPrefix=[],
--output=[reuters-seqfiles], --startPhase=[0], --tempDir=[temp]}
2013-01-04 13:06:01.751 java[5549:1203] Unable to load realm info from
SCDynamicStore
13/01/04 13:06:02 INFO driver.MahoutDriver: Program took 489 ms (Minutes:
0.00815)



Shrikars-MacBook-Pro:mahout-distribution-0.7 shrikar$
Shrikars-MacBook-Pro:mahout-distribution-0.7 shrikar$
Shrikars-MacBook-Pro:mahout-distribution-0.7 shrikar$ *bin/mahout seqdumper
-i reuters-seqfiles/chunk-0*
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /Users/shrikar/Installed/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/Users/shrikar/Installed/hadoop/conf
MAHOUT-JOB:
/Users/shrikar/Installed/mahout-distribution-0.7/examples/target/mahout-examples-0.7-job.jar
13/01/04 13:09:07 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --input=[reuters-seqfiles/chunk-0],
--startPhase=[0], --tempDir=[temp]}
2013-01-04 13:09:07.789 java[5619:1203] Unable to load realm info from
SCDynamicStore
Input Path: reuters-seqfiles/chunk-0
*Key class: class org.apache.hadoop.io.Text Value Class: class
org.apache.hadoop.io.Text*
*Count: 0*
13/01/04 13:09:08 INFO driver.MahoutDriver: Program took 452 ms (Minutes:
0.007533333333333334)

Am I missing something? The reuters-extracted has all the text files and I
have verified that.

Thanks,
Shrikar