You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by David Scarlatti <d_...@yahoo.es> on 2012/09/17 11:18:03 UTC

Pointer to Reference Docs

Hi, I'd appreciate  any hint on the best source of reference information...
I've found different examples and quick guides but If I want to know i.e.
what seqdirecoty or seq2sparse exactly does and which are the different
command line options with a detailed description, I can't find the place...
Is this something still to do in Mahout? Should I look to the source code
to knos this?

Thanks in advance.

-- 
-----
David.

Re: Pointer to Reference Docs

Posted by Julian Ortega <jo...@gmail.com>.
The *seqdirectory *command takes every file in the specified directory and
makes a Hadoop Sequence File
<http://wiki.apache.org/hadoop/SequenceFile>out of it. Sequence Files
have a key and a value, and in the case you want
to turn a list of file into Sequence Files then the file name will be the
key and the file contents will be the value. Nonetheless, this is quite
unpractical if your corpus is large as disk reading and writing can become
painfully slow. You might want to have a look at this discussion on
StackOverflow<http://stackoverflow.com/questions/11645294/how-can-i-use-mahouts-sequencefile-api-code/>which
discusses how to use the Sequence File API to transform a key-value
CSV file into sequence files

The *seq2sparse *Mahout shell command converts the text documents in
Sequence File format to vectors using either TF or
TF-IDF<http://en.wikipedia.org/wiki/Tf*idf>weighting with n-gram
generation.

I suggest looking at this quick
tour<https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html>
for
now, but I would strongly recommend reading the Mahout in Action
book<http://manning.com/owen/>,
specifically chapter 8.

Hope this helps

On Mon, Sep 17, 2012 at 11:18 AM, David Scarlatti <d_...@yahoo.es>wrote:

> Hi, I'd appreciate  any hint on the best source of reference information...
> I've found different examples and quick guides but If I want to know i.e.
> what seqdirecoty or seq2sparse exactly does and which are the different
> command line options with a detailed description, I can't find the place...
> Is this something still to do in Mahout? Should I look to the source code
> to knos this?
>
> Thanks in advance.
>
> --
> -----
> David.
>

Re: Pointer to Reference Docs

Posted by Lance Norskog <go...@gmail.com>.
Yes, the source code is the best reference.

On Mon, Sep 17, 2012 at 2:18 AM, David Scarlatti <d_...@yahoo.es> wrote:
> Hi, I'd appreciate  any hint on the best source of reference information...
> I've found different examples and quick guides but If I want to know i.e.
> what seqdirecoty or seq2sparse exactly does and which are the different
> command line options with a detailed description, I can't find the place...
> Is this something still to do in Mahout? Should I look to the source code
> to knos this?
>
> Thanks in advance.
>
> --
> -----
> David.



-- 
Lance Norskog
goksron@gmail.com