You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by rahul raghavendhra <ra...@gmail.com> on 2011/12/28 13:17:44 UTC

Mahout sequence file format

I am new to Mahout.. i just want to know how text file is converted into
seqfile and then to sparse vectors..
any kind of text file can  be converted into seq file using ./mahout
seqdirectory ?

thanks in advance..

./rahul

Re: Mahout sequence file format

Posted by Isabel Drost <is...@apache.org>.
On 28.12.2011 rahul raghavendhra wrote:
> I am new to Mahout.. i just want to know how text file is converted into
> seqfile and then to sparse vectors..

For more detailed pointers on where to start see also

 <https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text>


Isabel

Re: Mahout sequence file format

Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 28, 2011, at 7:17 AM, rahul raghavendhra wrote:

> I am new to Mahout.. i just want to know how text file is converted into
> seqfile and then to sparse vectors..

There are quite a few steps.  I would recommend checking out the code and walking through it.  See the SparseVectorsFromSequenceFiles class as well as SequenceFilesFromDirectory.


> any kind of text file can  be converted into seq file using ./mahout
> seqdirectory ?

it works with plain text files.  I believe you can pass in the encoding of the file.  Is that what you are looking for?

-Grant