You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by sampath <uk...@gmail.com> on 2012/08/15 18:22:02 UTC

Creating Mahout vector from existing vector

I already have a (term,weight) data using which I wanted to do an LDA
analysis to find the topics distribution.  

How should I create the Mahout vectors from this? 
Documentation says, I can use VectorWriter, but I'm not sure how to go with
this.

*Converting existing vectors to Mahout's format*

If you are in the happy position to already own a document (as in: texts,
images or whatever item you wish to treat) processing pipeline, the question
arises of how to convert the vectors into the Mahout vector format. Probably
the easiest way to go would be to implement your own Iterable<Vector>
(called VectorIterable in the example below) and then reuse the existing
VectorWriter classes:

VectorWriter vectorWriter = SequenceFile.createWriter(filesystem,
configuration, outfile, LongWritable.class, SparseVector.class);
long numDocs = vectorWriter.write(new VectorIterable(), Long.MAX_VALUE);





--
View this message in context: http://lucene.472066.n3.nabble.com/Creating-Mahout-vector-from-existing-vector-tp4001436.html
Sent from the Mahout User List mailing list archive at Nabble.com.