You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Divya <di...@k2associates.com.sg> on 2010/10/27 08:07:05 UTC
Configuration queries regarding mahout
Hi,
I have few queries regarding mahout
Requirement :
My requirement i
. I need to generate similar documents using Mahout whereas my input
will be an XML file like Wikipedia input.
Configuration doubts
Which mahout build I need to download as I can see
mahout-0.3.zip <http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3.zip>
mahout-0.3-src.zip
<http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3-src.zip>
mahout-0.3.tar.bz2
<http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3.tar.bz2>
Which one should I download to work with.
Configuration Steps I am following -
I have configured hadoop,cygwin and mahout 0.3 on windows
using link http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/
HADOOP_CONF_DIR = C:\cygwin\home\Divya\hadoop-0.19.2\conf
HADOOP_HOME = C:\cygwin\home\Divya\hadoop-0.19.2
JAVA_HOME = D:\InstalledSoftwares\Java\Java6\jdk1.6.0_21
JAVA_OPTS = -XX:MaxPermSize=128m
MAHOUT_HOME= D:\mahout-0.4
MAHOUT_OPTS = -Xmx1024m
MAVEN_HOME = D:\Downloads\Mahout\maven\apache-maven-2.2.1
Path = C:\cygwin\bin
Steps I am following to generate document similarity
SequenceFilesFromDirectory to generate sequence files
SparseVectorsFromSequenceFiles to generate vectors
Now I am stuck which utility should I use to compute document similarity
And then how do I convert this to human readable format.
Can any one help me here..
Regards,
Divya