You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Divya <di...@k2associates.com.sg> on 2010/10/27 08:07:05 UTC

Configuration queries regarding mahout

Hi,

 

I have few queries regarding mahout 

Requirement :

My requirement i

.         I need to generate similar documents using Mahout whereas my input
will be an XML file like Wikipedia input.

 

Configuration doubts 

Which mahout build I need to download as I can see 

mahout-0.3.zip <http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3.zip>


mahout-0.3-src.zip
<http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3-src.zip> 

mahout-0.3.tar.bz2
<http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3.tar.bz2> 

 

Which one should I download to work with.

 

Configuration Steps I am following -

I have configured hadoop,cygwin and mahout 0.3 on windows 

   using link  http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/

 

HADOOP_CONF_DIR = C:\cygwin\home\Divya\hadoop-0.19.2\conf

HADOOP_HOME = C:\cygwin\home\Divya\hadoop-0.19.2

JAVA_HOME = D:\InstalledSoftwares\Java\Java6\jdk1.6.0_21

JAVA_OPTS = -XX:MaxPermSize=128m

MAHOUT_HOME= D:\mahout-0.4

MAHOUT_OPTS =  -Xmx1024m

MAVEN_HOME = D:\Downloads\Mahout\maven\apache-maven-2.2.1

Path = C:\cygwin\bin

 

 

Steps I am following to generate document similarity 

 

SequenceFilesFromDirectory to generate sequence files 

SparseVectorsFromSequenceFiles to generate vectors 

 

Now I am stuck which utility should I use to compute document similarity 

And then how do I convert this to human readable format.

 

 

 

Can any one help me here..

 

 

 

Regards,

Divya