You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Natalia Connolly <na...@gmail.com> on 2014/03/19 17:02:25 UTC

Problem with mahout seqdirectory

Hello,

   I have mahout 0.9 and a single-node Hadoop 1.2.1 running on a Mac.

   I am trying to create a bunch of vectors for clustering from a
collection of text documents.  So I did:

$MAHOUT_HOME/bin/mahout seqdirectory --input
/Users/hadoop/fuzzyjoin-results/NOTES/progress_notes --output
/tmp/mahout-vectors/

    However, this gives me an error:

Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /Users/hadoop/fuzzyjoin-results/NOTES/progress_notes

even though the directory definitely exists and contains lots of files.

    After a lot of googling I found that if I add "-xm sequential" to the
above command, it does not complain; however, the output directory
(/tmp/mahout-vectors) is empty.

      Any help would be appreciated.

      Thank you,
      Natalia Connolly

Re: Problem with mahout seqdirectory

Posted by Pavan Kumar N <pa...@gmail.com>.
Hi Natalia,

It appears you are referencing files in your local file system instead of
files in HDFS. If you want to run Mahout "under Hadoop", you would then
need to access the input file stored in HDFS and ideally output could also
be stored in potential HDFS location. Here's how I would run:

mahout seqdirectory --input
hdfs://localhost:54311/user/hadoop/fuzzyjoin-results/NOTES/progress_notes
--output hdfs://localhost:54311/user/desired-path

may be you can try this. before that you need to transfer the files into
HDFS. You can do that by either of "bin/hadoop -fs copyFromLocal" or
"bin/hadoop -fs get" commands

hope this works out for you.

Pavan


On 19 March 2014 21:32, Natalia Connolly <na...@gmail.com>wrote:

> Hello,
>
>    I have mahout 0.9 and a single-node Hadoop 1.2.1 running on a Mac.
>
>    I am trying to create a bunch of vectors for clustering from a
> collection of text documents.  So I did:
>
> $MAHOUT_HOME/bin/mahout seqdirectory --input
> /Users/hadoop/fuzzyjoin-results/NOTES/progress_notes --output
> /tmp/mahout-vectors/
>
>     However, this gives me an error:
>
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /Users/hadoop/fuzzyjoin-results/NOTES/progress_notes
>
> even though the directory definitely exists and contains lots of files.
>
>     After a lot of googling I found that if I add "-xm sequential" to the
> above command, it does not complain; however, the output directory
> (/tmp/mahout-vectors) is empty.
>
>       Any help would be appreciated.
>
>       Thank you,
>       Natalia Connolly
>