You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Natalia Connolly <na...@gmail.com> on 2014/03/19 17:02:25 UTC
Problem with mahout seqdirectory
Hello,
I have mahout 0.9 and a single-node Hadoop 1.2.1 running on a Mac.
I am trying to create a bunch of vectors for clustering from a
collection of text documents. So I did:
$MAHOUT_HOME/bin/mahout seqdirectory --input
/Users/hadoop/fuzzyjoin-results/NOTES/progress_notes --output
/tmp/mahout-vectors/
However, this gives me an error:
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /Users/hadoop/fuzzyjoin-results/NOTES/progress_notes
even though the directory definitely exists and contains lots of files.
After a lot of googling I found that if I add "-xm sequential" to the
above command, it does not complain; however, the output directory
(/tmp/mahout-vectors) is empty.
Any help would be appreciated.
Thank you,
Natalia Connolly
Re: Problem with mahout seqdirectory
Posted by Pavan Kumar N <pa...@gmail.com>.
Hi Natalia,
It appears you are referencing files in your local file system instead of
files in HDFS. If you want to run Mahout "under Hadoop", you would then
need to access the input file stored in HDFS and ideally output could also
be stored in potential HDFS location. Here's how I would run:
mahout seqdirectory --input
hdfs://localhost:54311/user/hadoop/fuzzyjoin-results/NOTES/progress_notes
--output hdfs://localhost:54311/user/desired-path
may be you can try this. before that you need to transfer the files into
HDFS. You can do that by either of "bin/hadoop -fs copyFromLocal" or
"bin/hadoop -fs get" commands
hope this works out for you.
Pavan
On 19 March 2014 21:32, Natalia Connolly <na...@gmail.com>wrote:
> Hello,
>
> I have mahout 0.9 and a single-node Hadoop 1.2.1 running on a Mac.
>
> I am trying to create a bunch of vectors for clustering from a
> collection of text documents. So I did:
>
> $MAHOUT_HOME/bin/mahout seqdirectory --input
> /Users/hadoop/fuzzyjoin-results/NOTES/progress_notes --output
> /tmp/mahout-vectors/
>
> However, this gives me an error:
>
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /Users/hadoop/fuzzyjoin-results/NOTES/progress_notes
>
> even though the directory definitely exists and contains lots of files.
>
> After a lot of googling I found that if I add "-xm sequential" to the
> above command, it does not complain; however, the output directory
> (/tmp/mahout-vectors) is empty.
>
> Any help would be appreciated.
>
> Thank you,
> Natalia Connolly
>