You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Baoqiang Cao <bq...@gmail.com> on 2012/03/07 21:34:20 UTC

seq2sparse set min doc size

Hi,

I wonder if in seq2sparse step I could set a criteria for the minimum
number of words (after stop words) a document must have. Any help,
please?

Best,
Bao

Re: seq2sparse set min doc size

Posted by Lance Norskog <go...@gmail.com>.
That is the minSupport argument. seq2sparse is
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles which
calls org.apache.mahout.vectorizer.DictionaryVectorizer. Look for
'minSupport' and you'll see how it works.



On Wed, Mar 7, 2012 at 12:34 PM, Baoqiang Cao <bq...@gmail.com> wrote:
> Hi,
>
> I wonder if in seq2sparse step I could set a criteria for the minimum
> number of words (after stop words) a document must have. Any help,
> please?
>
> Best,
> Bao



-- 
Lance Norskog
goksron@gmail.com